Thursday, May 8, 2008

Changes in Report 10gR2 (10.1.2.)

We’ve seen a lot of issues since we upgraded Oracle Report from10gR1 (9.0.4) to10gR2(10.1.2). As it turns out this is caused by big change in Report 10gR2's structure. The Oracle White paper "A GUIDE TO CHANGED FUNCTIONALITY BETWEEN ORACLE REPORTS 6i AND 10g" said that the 10gR2 stops using Reports Server clustering. The new version of Oracle Report also is no longer running (stand-alone) Reports Server as a Windows service. We need to start/stop Reports Servers through OPMN or OEM. If you have only one Report Server in a machine and this is "in-process reports server", everything should be OK. However in our test/development environment, there are multiple stand-alone Report Servers. We spent a few hours to create multiple stand-alone Report Servers in 10gR2 but cannot start up them. Sometime the first one is up but not the second and the third etc. We finally found a document in Metalink (Note:401271.1) that show how to fix it. However, while we were able to start up all Report Servers, users cannot connect to those Report Servers to generate reports. The error showed in the end-users' browser was "REP-51002: bind to reports server failed" . After googling I found a document (Note:413478.1) saying that I must change default broadcast mechanism from multicast to namingservice???. I guested that is because we had to re-installed Oracle S/W in the same machine several time, the default broadcast mechanism did not work any more. After changing to namingservice, everything was back to normal.

Wednesday, October 31, 2007

High I/O Wait in AIX

Someone in our dept monitoring the AIX was making claims that a particular machine was running at 70% I/O wait during a period when a portion of the application was running slower than normal. The assertion being made was our system had a I/O bottleneck.

Given the fact that the application was running on a AIX platform I looked at the sar for just such an investigation. The sar data indicated a very low percentage of idle cpu, and surprisingly the percentage of time the system was waiting for I/O was about 60-70%. Although a high value for wait I/O generated from sar does not indicate a I/O bottleneck, an I/O bottleneck could result in high wait I/O percentages. Further looking at I/O service times and average waiting queue in iostat showed that the I/O subsystem was performing very poorly. This was a SAN box having only 3 drives in RAID-5. After adding 3 more new drives to this box, the I/O wait reduced substantially to <20%. Our application has had a bit performance's improvement since then. I felt something more could be archived. After some googling, I've found someone said that ideal I/O wait should be < 10%.

I remembered reading in IBM’s book, AIX Performance Tuning, that "a high % iowait indicates that the system has an application problem, a memory shortage, or an inefficient I/O subsystem configuration". The machine has enough memory (8G). The I/O subsystem now is OK. Thus, the rest problem must be in our application. It was found that there was a lot of PL/SQL functions used in queries that may lead to performance degradation. For a simple query the application requested a lot of I/O due to repeatedly calling PL/SQL functions. There also was other application problems such as missing indexes, not using bind variables etc. Our developers now are working very hard to fix them.