Change history MFA home Neals Homepage

This page documents my investigation into a peculiar and insideous bug that started causing erratic behaviour of the MFA GUI. One symptom was that on MacOS the interface would fail to appear when double clicking of a .mfax file to open it, there was also a very occasional odd behaviour of some of the dynamically generated editable text fields in lists etc.

Symptom

The MFA tool fails to open gui on (primarily on) Mac, although the jvm and most application threads are running. The command line reported an array out of bounds exception:

Exception in thread "AWT-EventQueue-0" Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
Exception in thread "AWT-EventQueue-0" java.lang.ArrayIndexOutOfBoundsException

Investigation

Further investigation using VisualVM revealed the actual problem being propagated from deep inside Swing. Here is the root of the 55 method deep call stack:

Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
    at javax.swing.text.GlyphView.getBreakSpot(GlyphView.java:799)
    at javax.swing.text.GlyphView.getBreakWeight(GlyphView.java:724)
    at javax.swing.text.html.InlineView.getBreakWeight(InlineView.java:150)
    at javax.swing.text.FlowView$LogicalView.getPreferredSpan(FlowView.java:733)
    at javax.swing.text.FlowView.calculateMinorAxisRequirements(FlowView.java:233)
    at javax.swing.text.ParagraphView.calculateMinorAxisRequirements(ParagraphView.java:717)
    at javax.swing.text.html.ParagraphView.calculateMinorAxisRequirements(ParagraphView.java:157)
    at javax.swing.text.BoxView.checkRequests(BoxView.java:935)
    at javax.swing.text.BoxView.getMinimumSpan(BoxView.java:568)
    at javax.swing.text.html.ParagraphView.getMin
    ...

So a quick download of the GlyphView class to find out what is does... here is the start of the source file:

/**
 * A GlyphView is a styled chunk of text that represents a view
 * mapped over an element in the text model. This view is generally
 * responsible for displaying text glyphs using character level
 * attributes in some way.
 ...

And since java 7 the line 799 that causes the error is as follows:

   
startFrom = breaker.preceding(s.offset + (startFrom - pstart))
	+ (pstart - s.offset);

'startFrom' and 'pstart' are both local integers so it must be either 'breaker' or 's' that is null, so here is the bit of code before the problem (1st line of the loop):

// Breaker should work on the parent element because there may be
// a valid breakpoint at the end edge of the view (space, etc.)
Element parent = getElement().getParentElement();
int pstart = (parent == null ? start : parent.getStartOffset());
int pend = (parent == null ? end : parent.getEndOffset());

Segment s = getText(pstart, pend);
s.first();
BreakIterator breaker = getBreaker();
breaker.setText(s);

// Backward search should start from end+1 unless there's NO end+1
int startFrom = end + (pend > end ? 1 : 0);
for (;;) {
	startFrom = breaker.preceding(s.offset + (startFrom - pstart))
		+ (pstart - s.offset);
	if (startFrom > start) {
		// The break spot is within the view
		bs[ix++] = startFrom;
	} else {
		break;
	}
}

so 'breaker' and 's' are both dereferenced just before the exception and not subsequently changed... so they CANNOT POSSIBLY be null! Now I am really intrigued. After a bit of internet searching I found this which indicates a possible problem with the JIT compiler. Indeed by disabling the JIT compiler optimisation for the javax GlyphView.getBreakSpot method using:

java -XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot -jar marking.jar somefile.mfax 

The problem appears to go away... So why is the JIT optimiser causing a variable to suddenly become null? A bit more searching and this. It appears that the JIT compiler is producing a faulty optimisation which seems to manifest itself around the 5th time (iteration) the getBreakSpot function is called... Looking at the bytecode for line 799:

getBreakSpot bytecode:
       127: aload 10
       129: getfield #140 // Field javax/swing/text/Segment.offset:I
       132: iload 12
       134: iload 8
       136: isub
       137: iadd
       138: invokevirtual #141 // Method java/text/BreakIterator.preceding:(I)I
       141: iload 8
       143: aload 10
       145: getfield #140 // Field javax/swing/text/Segment.offset:I

we see (index 8 = pstart, index 12 = startFrom, 10 = s) so bci 127,129 and bci 143,145 are s.offset so here it is even more clear that bci 145 cannot fail unless the call in bci 138 is faulty...

An explantion: "In a compiled getBreakSpot we have WhitespaceBasedBreakIterator.preceding, WhitespaceBasedBreakIterator.adjacent and WhitespaceBasedBreakIterator.checkhit inlined. In checkhit we are getting a deoptimization because of a range_check trap at bci 138 of getBreakSpot. After this deoptimization the locals in getBreakSpot are null, and at bci 145 the getField will fail and throw NullPointerException. The locals are not null in the compiled code, since the getField at bci 129 succeeded. I'm looking into why the interpreter locals are null after deopt now. "

As at 2015-02-09 08:51 the java 8 bug report states: "This is a Sustaining managed bug. Further details will be added later."

Helpful!

Andreas Eriksson (Oracle) wrote :
1) Compiled code encounters a range check trap, and begins a deoptimization.
2.a) During deoptimization the compiled code believes that this range check will fail, and assumes the interpreter have to throw an ArrayIndexOutOfBoundsException directly after deopt.
2.b) Since there is no exception handler in the current method the exception will propagate upwards in the call stack. So we don't need any locals for the method(s) we are in now, and they are killed.
3.a) Once in the interpreter, there seems to be no exception, and it keeps running.
3.b) Since all locals were killed (and therefore set to 0 / NULL / Top)
we get a nullpointer exception once we need local 10 for an object lookup.

Later Andreas added more explanation about why the compiler wrongly believes the range check fails in terms of the expression graph:
"The problem is with type propagation in C2. A CmpU can use type information from nodes two steps up in the graph, when input 1 is an AddI or SubI node. If the AddI/SubI overflows the CmpU node type computation will set up two type ranges based on the inputs to the AddI/SubI instead. These type ranges will then be compared to input 2 of the CmpU node to see if we can make any meaningful observations on the real type of the CmpU. This means that the type of the AddI/SubI can be bottom, but the CmpU can still use the type of the AddI/SubI inputs to get a more specific type itself. The problem with this is that the types of the AddI/SubI inputs can be updated at a later pass, but since the AddI/SubI is of type bottom the propagation pass will stop there. At that point the CmpU node is never made aware of the new types of the AddI/SubI inputs, and it is left with a type that is based on old type information. When this happens other optimizations using the type information can go very wrong; in this particular case a branch to a range_check trap that normally never happened was optimized to always be taken."

Tobias Hartmann ( bugs.openjdk.java.net/browse/JDK-8080156 ) adds more about the compiler expression graph type problem (numbers are nodes indices in the graph):

The problem is wrong type information in C2. At the point where we compute the type of 

 174	CmpU	=== _ 173 42 [[ 175 ]] !jvms: ItoaNPE::doStuff @ bci:48 
      top 

the values of the other nodes look like this 

  20	ConI	=== 0 [[ 156 26 ]] #int:min 
      int:min 
 141	SubI	=== _ 230 41 [[ 156 ]] !jvms: ItoaNPE::doStuff @ bci:31 
      int 
 156	Phi	=== 100 141 20 [[ 173 158 ]] #int !jvms: ItoaNPE::doStuff @ bci:25 
      int:min 
 172	RShiftI	=== _ 171 42 [[ 173 ]] !jvms: ItoaNPE::doStuff @ bci:45 
      int:-1073741824..1073741823:www 
 173	SubI	=== _ 172 156 [[ 182 174 ]] !jvms: ItoaNPE::doStuff @ bci:47 
      int 

Node "156 Phi" should not have the type "int:min" if the two input nodes have types "int:min" and "int". The problem is that we use type information from two levels above to compute the type of CmpU. Therefore, we should visit the CmpU again if the type of 156 Phi is updated to the right value because otherwise we rely on wrong/incomplete type information.

As far as the MFA is concerned it seems that this "NPE seen in GlyphView very intermittently" is rather MORE intermittent than the java developers suggest. It is dependent on the exact sequence and type of text appearing in the GUI at startup, complicated by the threading. The exception would be thrown and ultimately either cause the interface to fail to appear if the problem occured before the first 'setVisible(true);' on the main application window, or sometimes cause parts of the interface to become unresponsive until the gui elements were re generated - typically by loading new data."

As an aside it actually turns out to be the same bug as bugs.openjdk.java.net/browse/JDK-8080156 "Integer.toString(int value) sometimes throws NPE" affecting (8u45, 9) and should soon be fixed in the java compiler 8u60 and is fixed with this is the change to the java compiler code as of Thu, 21 May 2015 16:49:11 +0200:

Workaround

The MFA workaround for now for java7 and existing 8 seems to be to disable the JIT compilation for the Swing GlyphView class:

java -XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot -jar marking.jar somefile.mfax

However this is inconvienient for users so I have added code to cause the MFA to re run itself with the additional parameter, before any GUI is produced by adding the following code to the start of the main method:

// reconstruct a command to re run java with the required compiler directive
// to prevent JIT compilation for the Glyph generator which fails randomly due to a 
// bug in Swing for java 7 and 8
// the macos package does not need this if the -XX parameter can be added in the App bubdle script 
//  and therefore simply needs "rerun" adding as a command line parameter in the script.
		
if (args.length==0 || !args[args.length-1].equals("rerun")){
	System.out.println("First run of MFA");
	try {  
		// re-launch with VM option passed 
		StringBuffer cla = new StringBuffer();
				
		String[] newargs = new String[5+args.length];
		newargs[0] = "java";
		newargs[1] = "-XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot";
		newargs[2] = "-jar";
		newargs[3] = jarDir+File.separatorChar+"marking.jar";
		for (int i = 0; i<args.length; i++){
			newargs[4+i] = args[i];
		} 
		newargs[newargs.length-1] = "rerun";
				
		System.out.println("Launching second run");
		Runtime.getRuntime().exec(newargs); 
		System.out.println("Launched");
                
	} catch (IOException ioe) {  
		ioe.printStackTrace();  
	} 
	 
	System.exit(0);
} 

To run it without restarting the VM, the CL parameter "rerun" can be added to the end of the parameter list, and for the MacOS version that is bundled as an App with a script, the CL parameter is added in the script to avoid having to start the application twice.

All this because of a JIT compiler bug where "a branch to a range_check trap that normally never happened was optimized to always be taken"

Horrah !
Neal.