Science lives and dies by data. Today, the field of water resources is in the era of Big Data. Most published research papers contribute a small amount of data in the form of field observations, etc., but many also rely on other large, usually public datasets and data services. What regional watershed study, for example, does not use standard GIS datasets of basin boundaries, elevations, and streams distributed by data services, i.e. Big Data? Ongoing initiatives may produce Big Data “apps” to examine critical issues such as climate change. How good is this Big Data? Should we trust it? Can we make it better? This is where peer-reviewed journals come in.
Critical though Big Data may be, its creators are not well recognized in published journals. Important insights learned in building Big Data often remain in the minds of its creators, and users are left to discover strengths and weaknesses for themselves. With peer-reviewed journal articles, the developers of Big Data have a chance to explain the role of their data services in furthering science. A journal also provides the opportunity for discussions and replies, which contribute to greater understanding of Big Data.
Writing about Big Data
So, how does one write about Big Data? The typical research article format of hypothesis-testing-conclusion is not a good fit for describing data services. Moreover, most Big Data already are documented with users guides and with metadata, such as the Federal Geospatial Data Committee’s “Content Standard for Digital Geospatial Metadata (CSDGM).” User guides primarily are “how to” instructions with little nuance. The CSDGM is intended more for description and documentation than for evaluation and discussion. If you need to know the coding for the attributes of a data set, the CSDGM metadata is the place to look. If you need to see how and why these codes were used, it can be less helpful.
In thinking about how to write a journal article about your Big Data, I take it for granted you already have user guides and CSDGM metadata. These should be referenced, but need not be repeated except as an introduction. What more do you need to say to help researchers understand your Big Data? I see these four questions as critical:
- What was original about constructing your Big Data, and what lessons were learned?
- What assumptions did you make, and how might they affect using Big Data?
- How did you test Big Data?
- What are the known strengths and limitations of Big Data?
Every Big Data effort is original. If it were easy, someone already would have done it! The tipping point for proceeding typically is a new technology (e.g. LiDAR) or a pressing need that finally makes available the necessary resources. The mechanics of construction usually are described elsewhere. But all Big Data efforts involve design choices. Why did you choose the resolution you did? Why was a particular range of attributes chosen? Why does the interface work this way and not that way? The reasoning behind these decisions may help users a little, but they could be invaluable for later developers working on the next generation of Big Data.
I might call assumptions, “Where the bodies are buried.” All data systems are compromises. Time and resources always force developers to accept some things as “givens.” To use the National Hydrography Dataset (NHD) as an example, one big assumption of the early versions was that streams began with the “blue lines” of USGS maps. This was not a very good assumption, but there was nothing better available. Much of the subsequent criticism and misunderstanding of the NHD might have been softened had this point been made clearly in a journal article.
Most Big Data is tested extensively in the planning and development stages. The journal article should reference pilot studies or anything that gives an insight into how the Big Data can be used in practical situations. Keep in mind, testing information often resides in the “gray literature” of contractor reports; the journal article can be key in finding this information.
Finally, you need to be honest about the strengths and limitations of your Big Data. Nobody knows these better than the developers! What uses did you have in mind for Big Data, and how does it fulfill these hopes? Give potential users a reasonable expectation of how Big Data can help them. Don’t be afraid to advise on what should be made better in future versions of Big Data.
Technology for Technology
The text of a journal article may not be a very good venue for demonstrating a data system. The best user manuals today incorporate video. Somebody sits down in front of a computer and, with the video camera running, goes through an example application. Please note the online version of JAWRA articles can link to such demonstrations.
The Role of Peer Review
The role of the journal article is to describe and critically examine Big Data. Once Big Data reaches the milestone of journal article preparation, the data system is pretty much is what it is. Recommending major changes may not be helpful, with the exception being where Big Data simply fails to do what it is claimed to do. The focus of peer review, therefore, should be on how well the article answers the four questions I raised earlier.
Big Data has many elements that require the explanation and reasoned discussion a peer-reviewed journal provides. The format may differ from that of traditional articles, but it is science nonetheless. JAWRA will welcome articles on Big Data.