
Java Concurrency and Performance
Training Description
With the advent of multi-core processors the usage of single threaded programs is soon becoming obsolete. Java was built to be able to do many things at once. In computer lingo, we call this "concurrency".
This is the main reason why Java is so useful. Today we see a lot of our applications running on multiple cores, concurrent java programs with multiple threads is the answer for effective performance and stability on multi-core based applications. Concurrency is among the utmost worries for newcomers to Java programming but there's no reason to let it deter you. Not only is excellent documentation available but also pictorial representations of each topic to make understanding much graceful and enhanced. Java threads have become easier to work with as the Java platform has evolved. In order to learn how to do multithreaded programming in Java 6 and 7, you need some building blocks. Our training expert with his rich training and consulting experience illustrates with real application based case studies.
- The target group is programmers who want to know foundations of concurrent programming and existing concurrent programming environments, in order, now or in future, to develop multithreaded applications for multi-core processors and shared memory multiprocessors. 
Intended Audience
- This is an instructor led course provides lecture topics and the practical application of JEE5.0 and the underlying technologies. It pictorially presents most concepts and there is a detailed case study that strings together the technologies, patterns and design. 
Instructional Method
- Dealing with threads and collections on a multi-core/ multiprocessor. 
- To quickly identify the root cause of poor performance in your applications. 
- Eliminate conditions that will prevent you from finding performance bottlenecks. 
- JDK 5, 6, 7 which have features to harness the power of the underlying hardware. 
Key skills
- Basic knowledge of Java (introductory course or equivalent practical experience). 
Pre-requisites
- 01Why wait-notify require Synchronization - notifyAll used as work around 
- Structural modification to hidden queue by wait-notify 
- locking handling done by OS 
- use cases for notify-notifyAll 
- Hidden queue 
- design issues with synchronization 
 
- 02- Uncaught Exception Handler 
- problem with stop 
- Dealing with InterruptedStatus 
 
- 03- Real Meaning and effect of synchronization 
- Volatile 
- Sequential Consistency would disallow common optimizations 
- The changes in JMM 
- Final 
 Shortcomings of the original JMM - Finals not really final 
- Prevents effective compiler optimizations 
- Processor executes operations out of order 
- Compiler is free to reorder certain instructions 
- Cache reorders writes 
- Old JMM surprising and confusing 
 Instruction Reordering - What is the limit of reordering 
- Programmatic Control 
- super-scalar processors 
- heavily pipelines processors 
- As-if-serial-semantics 
- Why is reordering done 
 Cache Coherency - Write-back Caching explained 
- What is cache Coherence. 
- How does it effect java programs. 
- Software based Cache Coherency 
- NUMA(Non uniform memory access) 
- Caching explained 
- Cache incoherency 
 New JMM and goals of JSR-133 - Simple,intuitive and, feasible 
- Out-of-thin-air safety 
- High performance JVM implementations across architectures 
- Minimal impact on existing code 
- Initialization safety 
- Preserve existing safety guarantees and type-safety 
 
- 04- Safe Construction techniques 
- Thread Local Storage 
- Thread safety levels 
- UnSafe Construction techniques 
 
- 05CAS - Wait-free Queue implementation 
- Optimistic Design 
- Wait-free Stack implementation 
- Hardware based locking 
 ABA problem - Markable reference 
- weakCompareAndSet 
- Stamped reference 
 Reentrant Lock - ReentrantReadWriteLock 
- ReentrantLock 
 Lock Striping - Lock Striping on LinkNodes 
- Lock Striping on table 
 Indentifying scalability bottlenecks in java.util.Collection - segregating them based on Thread safety levels 
 Lock Implementation - Multiple user conditions and wait queues 
 Lock Polling techniques Based on CAS Design issues with synchronization 
- 06Weakly Consistent Iterators vs Fail Fast IteratorsConcurrentHashMap - Structure 
- remove/put/resize lock 
- Almost immutability 
- Using volatile to detect interference 
- Read does not block in common code path 
 
- 07- Atomicity 
- Confinement 
- Immutability 
- Visibility 
- Almost Immutability 
- Restructuring and refactoring 
 
- 08Thread confinement - Stack confinement 
- ThreadLocal 
- Unshared objects are safe 
- Ad-hoc thread confinement 
 Visibility - Synchronization and visibility 
- Non-atomic 64-bit numeric operations 
- Problems that state data can cause 
- Volatile vs synchronized 
- Single-threaded write safety 
- Volatile flushing 
- Making fields visible with volatile 
- Reason why changes are not visible 
 Immutability - Definition of immutable 
- Immutable is always thread safe 
- Immutable containing mutable object 
- Final fields 
 Safe publication - Making objects and their state visible 
- Safe publication idioms 
- How to share objects safely 
- "Effectively" immutable objects 
 Publication and escape - Publishing objects to alien methods 
- Publishing objects as method returns 
- Implicit links to outer class 
- Ways we might let object escape 
- Publishing objects via fields 
 
- 09Instance confinement - Split locks 
- Example of fleet management 
- Java monitor pattern 
- Lock confinement 
- Encapsulation 
- How instance confinement is good 
- State guarded by private fields 
 Documenting synchronization policies - Examples from the JDK 
- Documentation checklist 
- What should be documented 
- Synchronization policies 
- Interpreting vague documentation 
 Adding functionality to existing thread-safe classes - Benefits of reuse 
- Using composition to add functionality 
- Subclassing to add functionality 
- Modifying existing code 
- Client-side locking 
 Designing a thread-safe class - Pre-condition 
- Thread-safe counter with invariant 
- Primitive vs object fields 
- Encapsulation 
- Post-conditions 
- Waiting for pre-condition to become true 
 Delegating thread safety - Independent fields 
- Publishing underlying fields 
- Delegating safety to ConcurrentMap 
- Invariables and delegation 
- Using thread safe components 
- Delegation with vehicle tracker 
 
- 10- Semaphore 
- Latches 
- SynchronousQueue 
- Future 
- Exchanger 
- Synchronous Queue Framework 
- Mutex 
- Barrier 
 
- 11Finding exploitable parallelism - Callable controlling lifecycle 
- CompletionService 
- Limitations of parallelizing heterogeneous tasks 
- Callable and Future 
- Time limited tasks 
- Example showing page renderer with future 
- Sequential vs parallel 
- Breaking up a single client request 
 The Executor framework - Memory leaks with ThreadLocal 
- Delayed and periodic tasks 
- Thread pool structure 
- Motivation for using Executor 
- Executor lifecycle, state machine 
- Difference between java.util.Timer and ScheduledExecutor 
- ThreadPoolExecutor 
- Decoupling task submission from execution 
- Shutdown() vs ShutdownNow() 
- Executor interface 
- Thread pool benefits 
- Standard ExecutorService configurations 
 Execution policies - Various sizing options for number of threads and queue length 
- In which order? (FIFO, LIFO, by priority) 
- Who will execute it? 
 Executing tasks in threads - Disadvantage of unbounded thread creation 
- Single-threaded vs multi-threaded 
- Explicitely creating tasks 
- Indepence of tasks 
- Identifying tasks 
- Task boundaries 
 
- 12Stopping a thread-based service - Graceful shutdown 
- ExecutorService shutdown 
- Providing lifecycle methods 
- Asynchronous logging caveats 
- Example: A logging service 
- Poison pills 
- One-shot execution service 
 Task cancellation - Cancellation policies 
- Using flags to signal cancellation 
- Reasons for wanting to cancel a task 
- Cooperative vs preemptive cancellation 
 Interruption - Origins of interruptions 
- WAITING state of thread 
- How does interrupt work? 
- Methods that put thread in WAITING state 
- Policies in dealing with InterruptedException 
- Thread.interrupted() method 
 Dealing with non-interruptible blocking - Interrupting locks 
- Reactions of IO libraries to interrupts 
 Responding to interruption - Letting the method throw the exception 
- Saving the interrupt for later 
- Ignoring the interrupt status 
- Restoring the interrupt and exiting 
 Interruption policies - Task vs Thread 
- Different meanings of interrupt 
- Preserving the interrupt status 
 Example: timed run - Telling a long run to eventually give up 
- Canceling busy jobs 
 Handling abnormal thread termination - Using UncaughtExceptionHandler 
- Dealing with exceptions in Swing 
- ThreadGroup for uncaught exceptions 
 JVM shutdown - Shutdown hooks 
- Orderly shutdown 
- Daemon threads 
- Finalizers 
- Abrupt shutdown 
 
- 13Configuring ThreadPoolExecutor - Thread factories 
- corePoolSize 
- Customizing thread pool executor after construction 
- Using default Executors.new* methods 
- Managing queued tasks 
- maximumPoolSize 
- keepAliveTime 
- PriorityBlockingQueue 
 Saturation policies - Discard 
- Caller runs 
- Abort 
- Discard oldest 
 Sizing thread pools - Examples of various pool sizes 
- Determining the maximum allowed threads on your operating 
- system 
- CPU-intensiv vs IO-intensive task sizing 
- Danger of hardcoding worker number 
- Problems when pool is too large or small 
- Formula for calculating how many threads to use 
- Mixing different types of tasks 
 Tasks and Execution Policies - Long-running tasks 
- Homogenous, independent and thread-agnostic tasks 
- Thread starvation deadlock 
 Extending ThreadPoolExecutor - terminate 
- Using hooks for extensionafterExecutebeforeExecute 
- Using hooks for extension 
- afterExecute 
- beforeExecute 
 Parallelizing recursive algorithms - Using Fork/Join to execute tasks 
- Converting sequential tasks to parallel 
 
- 14Avoiding Liveness HazardsOther liveness hazards - Poor responsiveness 
- Livelock 
 Starvation - ReadWriteLock in Java 5 vs Java 6 
- Detecting thread starvation 
 Avoiding and diagnosing deadlocks - Adding a sleep to cause deadlocks 
- "TryLock" with synchronized 
- Using open calls 
- Verifying thread deadlocks 
- Avoiding multiple locks 
- Timed lock attempts 
- Stopping deadlock victims 
- DeadlockArbitrator 
- Deadlock analysis with thread dumps 
- Unit testing for lock ordering deadlocks 
 Deadlock - Thread-starvation deadlocks 
- Discovering deadlocks 
- Checking whether locks are held 
- Resource deadlocks 
- The drinking philosophers 
- Lock-ordering deadlocks 
- Defining a global ordering 
- Resolving deadlocks 
- Causing a deadlock amongst philosophers 
- Deadlock between cooperating objects 
- Imposing a natural order 
- Dynamic lock order deadlocks 
- Defining order on dynamic locks 
 Open calls and alien methods - Example in Vector 
 
- 15Thinking about performance - Mistakes in traditional performance optimizations 
- 2-tier vs multi-tier 
- Evaluating performance tradeoffs 
- Performance vs scalability 
- Effects of serial sections and locking 
- How fast vs how much 
 Reducing lock contention - How to monitor CPU utilization 
- Performance comparisons 
- ReadWriteLock 
- Using CopyOnWrite collections 
- Immutable objects 
- Atomic fields 
- Using ConcurrentHashMap 
- Narrowing lock scope 
- Avoiding "hot fields" 
- Hotspot options for lock performance 
- Reasons why CPUs might not be loaded 
- How to find "hot locks" 
 Lock splitting - Dangers of object pooling 
- Safety first! 
- Reducing lock granularity 
- Exclusive locks 
 Lock striping - In ConcurrentHashMap 
- In ConcurrentLinkedQueue 
 Amdahl's and Little's laws - Formula for Amdahl's Law 
- Problems with Amdahl's law in practice 
- Applying Little's Law in practice 
- Utilization according to Amdahl 
- Maximum useful cores 
- How threading relates to Little's Law 
- Formula for Little's Law 
 Costs introduced by threads - Context switching 
- Locking and unlocking 
- Cache invalidation 
- Spinning before actual blocking 
- Lock elision 
- Memory barriers 
- Escape analysis and uncontended locks 
 
- 16Lock and ReentrantLock - Using try-finally 
- Memory visibility semantics 
- Using try-lock to avoid deadlocks 
- tryLock and timed locks 
- Interruptible locking 
- Non-block-structured locking 
- ReentrantLock implementation 
- Using the explicit lock 
 Synchronized vs ReentrantLock - Memory semantics 
- Prefer synchronized 
- Ease of use 
 Performance considerations - Heavily contended locks 
- Java 5 vs Java 6 performance 
- Throughput on contended locks 
- Uncontended performance 
 Fairness - Standard non-fair mechanisms 
- Throughput of fair locks 
- Round-robin by OS 
- Barging 
- Fair explicit locks in Java 
 Read-write locks - ReadWriteLock interface 
- Understanding system to avoid starvation 
 ReadWriteLock implementation options - Release preference 
- Downgrading 
- Reader barging 
- Upgrading 
- Reentrancy 
 
- 17Explicit condition objects - Condition interface 
- Timed conditions 
- Benefits of explicit condition queues 
 AbstractQueuedSynchronizer (AQS) - Basis for other synchronizers 
 Managing state dependence - Exceptions on pre-condition fails 
- Structure of blocking state-dependent actions 
- Crude blocking by polling and sleeping 
- Example using bounded queues 
- Single-threaded vs multi-threaded 
 Introducing condition queues - With intrinsic locks 
 Using condition queues - Waking up too soon 
- Conditional waits 
- Condition queue 
- Encapsulating condition queues 
- State-dependence 
- notify() vs notifyAll() 
- Condition predicate 
- Lock 
- Waiting for a specific timeout 
 Missed signals - InterruptedException 
 
- 18Hardware support for concurrency - Using "Unsafe" to access memory directly 
- CAS support in the JVM 
- Compare-and-Set 
- Performance advantage of padding 
- Nonblocking counter 
- Simulation of CAS 
- Managing conflicts with CAS 
- Compare-and-Swap (CAS) 
- Shared cache lines 
- Optimistic locking 
 Atomic variable classes - Optimistic locking classes 
- How do atomics work? 
- Atomic array classes 
- Performance comparisons: Locks vs atomics 
- Cost of atomic spin loops 
- Very fast when not too much contention 
- Types of atomic classes 
 Disadvantages of locking - Priority inversion 
- Elimination of uncontended intrinsic locks 
- Volatile vs locking performance 
 Nonblocking algorithms - Scalability problems with lock-based algorithms 
- Atomic field updaters 
- Doing speculative work 
- AtomicStampedReference 
- Nonblocking stack 
- Definition of nonblocking and lock-free 
- Highly scalable hash table 
- The ABA problem 
 Using sun.misc.Unsafe - Dangers 
- Reasons why we need it 
 
- 19- Fork -join decomposition 
- Fork and Join 
- ParallelArray 
- Divide and conquer 
- Hardware shapes programming idiom 
- Exposing fine grained parallelism 
- Anatomy of Fork and Join 
- Limitations 
- Work Stealing 
 
- 20- Amdahl's Law 
 Cache - cache controller 
- write 
- Direct mapped 
- read 
- Address mapping in cache 
 Memory Architectures - NUMA 
- UMA 
 
- 21- Concurrent Stack 
- Harsh Realities of parallelism 
- Parallel Programming 
 Concurrent Objects - Sequential Consistency 
- Linearizability 
- Concurrency and Correctness 
- Progress Conditions 
- Quiescent Consistency 
 Concurrency Patterns - Lazy Synchronization 
- Lock free Synchronization 
- Optimistic Synchronization 
- Fine grained Synchronization 
 Priority Queues - Heap Based Unbounded Priority Queue 
- Skiplist based Unbounded priority Queue 
- Array Based bounded Priority Queue 
- Tree based Bounded Priority Queue 
 Lists - Coarse Grained Synchronization 
- Lazy Synchronization 
- Optimistic Synchronization 
- Non Blocking Synchronization 
- Fine Grained Synchronization 
 SkiplistsSpinlocks - Lock suitable for NUMA systems 
 Concurrent Queues - Unbounded lock-free Queue 
- Bounded Partial Queue 
- Unbounded Total Queue 
 Concurrent Hashing - Open Address Hashing 
- Closed Address Hashing 
- Lock Free Hashing 
 
- 22- CopyOnWriteArray(List/Set) 
 NonBlockingHashMap - For systems with more than 100 cpus/cores 
- State based Reasoning 
- all CAS spin loop bounded 
- Constant Time key-value mapping 
- faster than ConcurrentHashMap 
- no locks even during resize 
 Queue interfaces - Queue 
- BlockingQueue 
- Deque 
- BlockingDeque 
 Queue ImplementationsArrayDeque and ArrayBlockingDeque - WorkStealing using Deques 
 LinkedBlockingQueue LinkedBlockingDeque ConcurrentLinkedQueue - GC unlinking 
- Michael and Scott algorithm 
- Tails and heads are allowed to lag 
- Support for interior removals 
- Relaxed writes 
 ConcurrentLinkedDeque - Same as ConcurrentLinkedQueue except bidirectional pointers 
 LinkedTransferQueue - Internal removes handled differently 
- Heuristics based spinning/blocking on number of processors 
- Behavior differs based on method calls 
- Usual ConcurrentLinkedQueue optimizations 
- Normal and Dual Queue 
 Skiplist - Lock free Skiplist 
- Sequential Skiplist 
- Lock based Concurrent Skiplist 
 ConcurrentSkipListMap(and Set) - Indexes are allowed to race 
- Iteration 
- Problems with AtomicMarkableReference 
- Probabilistic Data Structure 
- Marking and nulling 
- Different Way to mark 
 
Topics
