May 14, 2012

Exports data error - The blogger's white paper

There was a problem with the reporting of merchandise exports data a few months ago. It was blamed  on computer crash. Commerce secretary retorted with a 'mistakes do happen' stuff. I personally never bought that line, knowing very well, that no 'crash' can do what had happened. A very well written piece, with scathing attack on the commerce secretary can be seen here. It is a must read for perspective on the issue. The magnitude of error is logically reasoned out below (from the article) and that's the key to lot of answers. It reads:

"... Fourth, the real goof-up is not merely $9.4 billion, but much bigger. For example, the figures given out on Friday spoke of a $15 billion over-reporting of engineering exports, and a $12 billion underestimation in the case of petroleum and gems and jewellery. The net figure may be $9.4 billion, but what has really happened is a $27 billion error – since one error in engineering and another in petroleum and gems cannot really cancel each other out. ...."

The Govt. decided to bring out a 'white paper' in January2012. I am not sure what happened to that and what stage is the white paper in, if at all someone is actually working on it. 

I thought of doing some Sherlock Holmes work, sitting with modern internet and some knowledge of things going around. So here's the story, as per me. Let me know if you have a better one!

The exports data is collected from the Shipping Bills (a document) that is filed with customs. For imports, Bills of Entry is  filed with customs. These documents carry the ITC-HS code of the product to identify/classify the exports, the FOB/CIF value of exports/imports in relevant currencies, and some other details. The documents are filed electronically. The entire database of shipping bills and bills of entry is kept in the centralized servers of Customs department (under CBEC/Ministry of Finance). They have two database servers at Delhi, one main server, one backup server that mirrors the main, and an additional disaster server that's kept in some other location. These servers are the ones that  hold all the information that are filed in various customs ports, on current basis. A few transactions at some non computerized ports are added up manually later on, but the percentage of that is negligible (and will be reduced to zero in coming years). 'DG Systems and Data Management' (DG-Systems), a body under Central Board of Excise and Customs (CBEC),  is responsible to maintain the database and servers. More details can be found here. DG-Systems/CBEC belongs to Ministry of Finance. 
Directorate General of Commercial Intelligence and Statistics (DGCIS) Kolkata, under Ministry of commerce, is the nodal agency that is mandated for collection, analysis and dissemination of trade data. In the earlier days, they used to collect the manual shipping bill copies from various customs ports and generate trade statistics. In the modern days, they get the data from the DG-Systems directly. In fact, DG-Systems gives them the access to the database. 

The bare database access is used by DGCIS to analyze and generate trade statistics. And this is where complications arise. The database is a huge pile of stored values in tables. To get it in some sane order and to manipulate the data, one has to use queries (e.g. SQL) and data manipulation language (DML) to generate data in required manner. 
The best people to work on databases are the ones who actually created the database and the structure inside. Second best are people who are there on the live systems and maintain the database. The last kinds are the ones who have the access and are not live and less familiar. DGCIS happens to be of the last kind. In order  to generate the data in the way they want, they have to create the queries in correct manner. Otherwise, the database will throw up results that might not add up or make sense. This step is most vulnerable to errors. The queries need to be tested thoroughly before being put to use. 

Analyzing the errors in data reported, it looks like it was an error of a wrong query. The tables of data generated were numbers, which looked reasonable but were off from the actual values. Sometimes the query has thrown up the same number twice (leading to double figures) for some HS codes (e.g. Engg items) and sometimes, some data rows are missed (e.g. petroleum/gems/jewellery). Only a wrong query can generate these kinds of errors. So the guy running the queries at DGCIS must have goofed up, for the story to be what it appears.

So in short, 'mistakes do happen'. A query error is a mistake. The question is, how do you make the process robust to avoid such errors in future. And is DGCIS the right body to analyze the database in today's world when you don't need the data collection and compilation due to computerization of the inputs. My personal take is, let the DG systems do the data compilation/dissemination too. They just need to add a small team to their existing workforce for this task. The database is theirs, let them get the direction from Min of Commerce as to what is required, and then run the query and transfer the results. The mandate of DG-systems needs revision in order to do that. 

The second best solution is, develop an 'analytics team' in DGFT/Min of commerce/DGCIS that can do such activities on databases. The second best solution is better in the long run, as data trends and analysis is something that is core of trade negotiations/policy making (ahem!), and is the forte of Min of Commerce. 

PS: It simply beats me as to why DGCIS charges money for the data that they never collected and hardly added any value! They should offer the summary historical data for free on their website.  Is it because they are headquartered in Kolkata? Look at RBI and learn. 


PPS: A crash would mean that you just have to run the query when the computer/database is ready again. No crash can generate such errors as double entry and missed rows with other data points intact.