Data Management

Introduction

Data management helps in optimizing the use of data so that decisions made and actions taken lead to maximum benefit. A robust data management system is very important to on-line system, big data, and critical infrastructures. Without data management, it will lead to incompatible data sets, inconsistent data sets, data quality problems, etc. that will delay projects making integrating very tedious.


<Editor Opinion>

Using some of Wikipedia [link], some of the major topics we picked and modified in data management article include:

  • Data Architecture
  • Data Governance
    • Data stewardship, GDPR
  • Data Quality
    • Metric, data profiling, data scrubbing
    • "Having data that does not represent reality accurately, means that any application developed on top of them is practically useless"
  • Data Security
    • Sensitive?
  • Documents and Content


Career in Data Management

  • Chief Data Officer (CDO)
  • Data architects,
  • Data analysts,
  • Data scientist,
  • Data modelers,
  • Database administrators (DBAs),
  • Database developers,
  • Data quality analysts and engineers,
  • Data integration developers,
  • Data governance managers,
  • Data stewards and data engineers

More Information

Big Data Management

In big data management, data is generally store and process in a data lake or warehouse preferably using object storage for its efficiency, security and reliability.

Data Warehouse

Data warehouses are typical relational database that store structured data from various sources. The mini version of it are the data marts that houses smaller version of data warehouses for local usage.

Data Lake

Data lakes are huge bulk of data that typically store raw data in NoSQL databases and object storage.

Data Management Best Practices

Extracted from Oracle [link]:

  1. Create a discovery layer to identify your data
  2. Develop a data science environment to efficiently repurpose your data
  3. Use autonomous technology to maintain performance levels across your expanding data tier
  4. Use discovery to stay on top of compliance requirements
  5. Use a common query layer to manage multiple and diverse forms of data storage

Cloud Data Management Tools

Examples:

  1. Amazon Web Services
  2. Microsoft Azure
  3. Google Cloud

Notable points to note in cloud databases are

  • the cost involved over the life cycle, and
  • the possible lock down to a vendor (when it cost a lot to migrate to another vendor).

ETL Data Tools

ETL is the short form of Extract, Transform, and Load in data lakes processes. Some of these processes run at scheduled intervals for batch processing, while other smaller processes run in real time.

Examples:

  1. Stitch Data
  2. Blendo
  3. Azure Data Factory

Data Analytics and Visualization Tools

Examples:

  1. Tableau
  2. Microsoft Power BI
  3. Mode Analytics

Data Security Technologies

  • Antivirus
  • Cloud Storage
  • Email Protection
  • Firewall
  • Secure Network Access
  • Secure Wi-Fi

Publications

Huberman, A. M., & Miles, M. B. (1994). Data management and analysis methods. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (p. 428–444). Sage Publications, Inc.

  • Abstract: [advance a comprehensive model of] the management, analysis, and interpretation of qualitative empirical materials / [this] model distinguishes within- from between-case analysis while emphasizing the necessary connection between a theory and its concepts and the empirical indicators that reflect back through the concepts to the theory / show how codes, memos, and diagrams can help a researcher work from field notes to some conceptual understanding of the processes being studied / this model stresses variables and causal links between variables while focusing on an iterative approach that is fully open to discovery and the treatment of negative cases.

Acharya S., Alonso R., Franklin M., Zdonik S. (1995). Broadcast Disks: Data Management for Asymmetric Communication Environments. In Mobile Computing. The Kluwer International Series in Engineering and Computer Science, vol 353. Springer, Boston, MA.

  • Abstract: This paper proposes the use of repetitive broadcast as a way of augmenting the memory hierarchy of clients in an asymmetric communication environment. We describe a new technique called “Broadcast Disks” for structuring the broadcast in a way that provides improved performance for non-uniformly accessed data. The Broadcast Disk superimposes multiple disks spinning at different speeds on a single broadcast channel — in effect creating an arbitrarily fine-grained memory hierarchy. In addition to proposing and defining the mechanism, a main result of this work is that exploiting the potential of the broadcast structure requires a re-evaluation of basic cache management policies. We examine several “pure” cache management policies and develop and measure implementable approximations to these policies. These results and others are presented in a set of simulation studies that substantiates the basic idea and develops some of the intuitions required to design a particular broad cast program

Clifford Lynch , "How do your data grow?," Nature, vol 455, no. 7209, pp. 28–29, Feb. 2008. DOI: 10.1038/455028a

  • Abstract: Data can be 'big' in different ways. National and international projects such as the Large Hadron Collider (LHC) at CERN, Europe's particle-physics laboratory near Geneva in Switzerland, or the Large Synoptic Survey Telescope planned for northern Chile, are frequently cited for the way they will challenge the state of the art in computation, networking and data storage.

Sam Madden, "From Databases to Big Data," IEEE Internet Computing, vol. 16, no. 3, pp. 4-6, Apr. 2012. DOI: 10.1109/MIC.2012.50

  • Abstract: There is a tremendous amount of buzz around the concept of "big data." In this article, the author discusses the origins of this trend, the relationship between big data and traditional databases and data processing platforms, and some of the new challenges that big data presents.
  • Summary:
    • Databases do not solve all big data issues
    • Machine learning algorithms have to changed to such as a data management ecosystem for the ordinary user to understand and learn

Jinchuan Chen, Yueguo Chen, Xiaoyong Du, Cuiping Li, Jiaheng Lu, Suyun Zhao, and Xuan Zhou, "Big data challenge: a data management perspective," Frontiers of Computer Science, vol. 7, no. 2, pp. 157–164, Apr. 2013. DOI: 10.1007/s11704-013-3903-7

  • Abstract: There is a trend that, virtually everyone, ranging from big Web companies to traditional enterprisers to physical science researchers to social scientists, is either already experiencing or anticipating unprecedented growth in the amount of data available in their world, as well as new opportunities and great untapped value. This paper reviews big data challenges from a data management respective. In particular, we discuss big data diversity, big data reduction, big data integration and cleaning, big data indexing and query, and finally big data analysis and mining. Our survey gives a brief overview about big-data-oriented research and problems

Read more

Note: Limited read page count on external host website

  • Roman, Victor (2019, Sep 27). Data Management Strategy: Introduction. Retrieved Feb. 24, 2020 from towardsdatascience.com
  • Roman, Victor (2019, Oct 3). Data Management Strategy: Part 1. Retrieved Feb. 24, 2020 from towardsdatascience.com
  • Roman, Victor (2019, Oct 14). Data Mangement Strategy: Part 2. Retrieved Feb. 24, 2020 from towardsdatascience.com
  • Roman, Victor (2019, Oct 20). Data Management Strategy: Part 3. Retrieved Feb. 24, 2020 from towardsdatascience.com

References

  • Weinberg, Peter (2020, Jan 7). 28 Data Management Tools & 5 Ways Of Thinking About Data Management. Retrieved Feb. 24, 2020 from panoply blog
  • n.d. (n.d.). What Is Data Management? Retrieved Feb. 24, 2020 from Oracle
  • Rouse, Margaret (2019, Oct). What Is Data Management and Why Is It Important?Retrieved Feb. 24, 2020 from techtarget

Keywords

Data science strategy, information management, knowledge management, I2R data management, automation, research, science, Singapore

Tags

#data #management #information #knowledge #NTU #singapore #limjunlong