Discovering the Data

From CNM Wiki
Jump to: navigation, search

Discovering the Data (hereinafter, the Session) is a hands-on session designed to get its participants started discovering the data relevant to WorldOpp. The Session is the sixth of ten sessions of CNMCyber Bootcamps (hereinafter, the Seminar).

Defining and discovering data and metadata, identifying data source


Outline

Understanding the DREPD is the predecessor session.

How to define data

Data definition in general terms is the available of information about a subject, its characteristics and features. In CNM Wiki data refers to the information available on the wiki about various concepts that the learners can read to gain knowledge. The wiki contains a lot of information about business management, entrepreneurship, Recruitment and employment all geared towards helping learners gain all round knowledge in a wide range of fields to gain employable skills.

How to discover data

Discovering Data in the CNM Wiki makes use of the links to both internal and external sources. The links are found inside the texts and help users navigate more easily. All words containing links are highlighted in a blue color and the user can click these to be redirected to other pages that contain more information about the subject.

The Wiki also employs use of recorded audios, videos and other material to supplement the learning process. These are found as sections within the Wiki pages towards the bottom of the screen. Users are welcome to develop their own materials to enable new entrants navigate the site more easily and ensure a seamless learning experience.

The site also supports files uploads and the users can upload images and other files that they feel would be beneficial to other learners. To access the library of uploaded files, users can click on button.


Categorization The pages are categorized in accordance to the Media Wiki conventions. This makes it easy for the user to know where they are in the Wiki and to navigate easily to other pages of interest.

How to define metadata

metadata refers to data that describes other data. Metadata is mostly used by the system developers to explain the user data in this case study materials have been classified or arranged. The term 'Meta' is a prefix that refers to 'underlying definition or description' . For example, author, date created, and file size as samples of basic document meta data that help users sift through documents and easily locate the specific ones they are interested in.

There are three (3) main types of metadata

  1. Descriptive Metadata - this type describes a resources for purposes of discovery and may contain information such as title of document, abstract, author etc.
  2. Structural metadata shows how compound objects have been joined together. For example, the pages in a Wiki
  3. Administrative Metadata provides information to help manage resources and will include details such as when the data was created, access levels and file type among others.

Administrative data is broken into two branches

    • rights management metadata which deals with intellectual rights and copyright and
    • preservation metadata which deals with how a resource should be preserved or archived.



Creating Metadata

Metadata can be created manually or can be automated. Manual creation of metadata though time consuming tends to give more accurate data since they allow the user to input details they feel would be relevant to the user.

How to discover metadata

metadata discovery is also known as metadata harvesting or metadata discovery and is the process of using automated tools to analyze data sets with the aim of discovering the semantics. This process results in mapping of data data elements to the metadata registry.

metadata is different from unstructured text that describes a resource. The structure of metadata takes elements with defined semantics to describe a resource and allows for machine processing.

In the CNM Wiki metadata has been used extensively and include

  • Categories
  • Interwiki links
  • Copyright violation, under construction, protection, and so on
  • External links

How to identify data source

A data source is basically the place where data is stored for retrieval and use by the user. Data can be split into two (2) broad categories

  • Qualitative data -
  • Quantitative data - this involves numbers and statistics and allows users to manipulate and perform mathematical operations to make decisions and conclusions about the subject


There are two main types of data sources

  • machine data sources
  • File data sources

Both types of data sources contain information about the data but differ in regard to how the data is stored.

Data can exists in three main formats

  • Unstructured data - This is a raw form of data and can be any type of file (PDF, images, videos audio). This type of data is usually stored in a repository of files.
  • Semi - structured data - this is data that is consistent and well defined. It is semi structured in the sense that it is not necessarily in tabular form and may be incomplete and is often stored as files
  • Structured data - this type of data is stored in tabular format and is usually well defined in the sense that that the user knows how many rows or columns there are and what kind of information is stored in them. It can be stored and queried from a database to create datasets.


In the scope of this seminar, data refers to the recorded videos, audio, images and lectures developed about various subjects. The learner is expected to rely on these sources of information as well as external sources to understand concepts illustrated in the learning materials.

While working with external sources, it is expected that learners can use the search engines on the web to search for relevant information about a concept or subject. While working with internal data sources contained in the CNM Wiki users can make use of metadata to easily search and identify the data they are interested in. This would include the type of resource; whether an image, text or PDF document, The date it was created and the copyright issues.

Analyzing the Sources is the successor session.

Materials

Recorded audio

Recorded video

Live sessions

Texts and graphics

See also