Question 1

Which component of DBMS is responsible for the metadata or the 'data about data'?

Accepted Answer

Data Dictionary. • **Data Dictionary** = a centralised metadata repository that stores definitions and descriptions of all database objects — table names, column names, data types, constraints, relationships, and access permissions. It is the system's own record of its structure, often called the 'system catalogue'.
• **Role in query processing** — before executing any SQL statement, the DBMS consults the data dictionary to validate table and column names, check user privileges, enforce data types, and resolve references; without it, the DBMS cannot safely process any query.
• Database administrators use the data dictionary to audit schema changes, trace relationships between tables, and provide documentation for developers joining a project.
• 💡 Option A (Query Processor) is wrong because the query processor parses and executes SQL statements but does not store schema metadata; Option B (Transaction Manager) is wrong because it coordinates ACID compliance during transactions, not metadata storage; Option C (Database Engine) is wrong because the engine handles data storage and retrieval operations, not object definitions.

Question 2

What is the process of minimizing data redundancy and dependency by organizing fields and tables?

Accepted Answer

Normalization. • **Normalization** = the systematic process of organising a relational database into well-structured tables to eliminate data redundancy and prevent update, insertion, and deletion anomalies. It proceeds through progressive stages called Normal Forms — 1NF, 2NF, 3NF, and BCNF.
• **Anomalies it prevents** — an update anomaly occurs when the same fact is stored in multiple rows, forcing multiple edits to keep data consistent. Normalization ensures each data fact lives in exactly one place, so one update is sufficient and inconsistency is impossible.
• Normalising to 3NF is standard practice; it balances data integrity with reasonable query complexity, avoiding the overhead of too many joins.
• 💡 Option B (Indexing) is wrong because indexing accelerates data retrieval but does not restructure tables or remove redundancy; Option C (Encapsulation) is wrong because encapsulation is an object-oriented programming concept that bundles data with its methods; Option D (Querying) is wrong because querying is the act of requesting data, not a structural design process.

Question 3

Which of the following is an example of a NoSQL database?

Accepted Answer

MongoDB. • **MongoDB** = a leading NoSQL database that stores records as flexible BSON (Binary JSON) documents inside collections rather than in fixed rows and columns. Because each document can have a different structure, MongoDB suits applications where the data schema evolves frequently or varies between records.
• **Why NoSQL** — NoSQL databases trade strict relational structure for horizontal scalability and schema flexibility, handling unstructured data like social media posts, product catalogues, and real-time event streams far more easily than traditional RDBMS.
• MongoDB uses collections instead of tables and documents instead of rows; it provides powerful querying, indexing, and aggregation while remaining schema-free.
• 💡 Option A (MySQL) is wrong because MySQL is a relational database requiring predefined table schemas and SQL; Option B (Oracle) is wrong because Oracle Database is a full-featured RDBMS with strict table structures; Option D (PostgreSQL) is wrong because PostgreSQL is an advanced open-source RDBMS that uses structured tables and standard SQL.

Question 4

What is 'Data Independence' in a DBMS environment?

Accepted Answer

The ability to change the schema at one level without changing it at another. • **Data Independence** = the ability to modify the schema at one level of the database architecture without requiring changes at other levels. It is the key advantage of layered abstraction in DBMS design and is classified into two types: logical and physical.
• **Two types** — Physical Data Independence allows changes to storage structures (switching storage media, reorganising indexes) without affecting the logical schema; Logical Data Independence allows changes to the conceptual schema (adding a new column) without altering user-facing views or application code.
• Without data independence, any reorganisation of a table would break every application accessing that table, making database maintenance prohibitively expensive.
• 💡 Option A (store data on any hard drive) is wrong because that describes portability, not schema-level independence; Option B (use data without a password) is wrong because that describes open or unauthenticated access; Option C (share data between two companies) is wrong because that describes data sharing or interoperability, not independence between schema levels.

Question 5

In the relational model, what is the term for a table?

Accepted Answer

Relation. • **Relation** = the formal mathematical term for a table in the relational model, introduced by E.F. Codd in his 1970 landmark paper. A relation is defined as a set of tuples (rows) sharing the same set of attributes (columns), with no duplicate tuples allowed — grounding database design in set theory.
• **Why the term matters** — calling a table a 'relation' reflects its mathematical nature as a relationship among attribute domains; this precise vocabulary underpins formal query languages like relational algebra from which SQL is derived.
• Codd's relational model replaced older hierarchical and network models and became the theoretical foundation for all SQL-based RDBMS systems still dominant today.
• 💡 Option B (Entity) is wrong because an entity is a real-world object modelled in an ER diagram that maps to a table, but is not the formal term for the table itself; Option C (Domain) is wrong because a domain is the set of permissible values for a single attribute; Option D (Instance) is wrong because an instance refers to the actual data present in a relation at one specific point in time.

Question 6

What is the maximum number of primary keys a single table can have?

Accepted Answer

One. • **One primary key per table** = every relational table can have exactly one primary key, which may be a single column or a composite of multiple columns. Even a composite primary key counts as one primary key; the uniqueness constraint applies to the combined value of all participating columns.
• **Composite primary key example** — in an Enrollment table linking students and courses, neither Student_ID nor Course_ID alone is unique, but together (Student_ID, Course_ID) uniquely identifies each enrollment, forming a single composite primary key.
• Having multiple primary keys on one table is logically impossible in the relational model because each row can have only one canonical identifier.
• 💡 Option B (Two) is wrong because no relational standard permits two primary keys on one table; Option C (Depends on the columns) is wrong because the rule is absolute — one primary key regardless of column count; Option D (Unlimited) is wrong because unlimited primary keys would undermine the concept of a single definitive row identifier.

Question 7

Which SQL command is used to remove all records from a table without deleting the table structure?

Accepted Answer

TRUNCATE. • **TRUNCATE** = a DDL command that removes all rows from a table in one operation while preserving the table's structure, columns, and constraints. Because it deallocates data pages directly rather than logging individual row deletions, it is significantly faster than DELETE for large tables.
• **Key distinction from DELETE** — DELETE is a DML command that logs each deleted row individually, supports a WHERE clause to remove specific rows, and can be rolled back inside a transaction; TRUNCATE cannot be filtered with WHERE and, in many databases, cannot be rolled back.
• After TRUNCATE, an auto-increment column typically resets its counter to 1, whereas DELETE leaves the counter at its last value.
• 💡 Option A (DROP) is wrong because DROP deletes the entire table structure along with all its data, not just the rows; Option B (DELETE) is wrong because DELETE removes rows one by one and can use a WHERE clause, but is slower and logs each deletion; Option D (REMOVE) is wrong because REMOVE is not a valid SQL command.

Question 8

Which of the following describes the 'Isolation' property in ACID?

Accepted Answer

Transactions are executed as if they are the only ones on the system. • **Isolation** = the ACID property that ensures concurrent transactions execute as though each is the only transaction running on the system. Intermediate, uncommitted changes made by one transaction are invisible to all other transactions until that transaction commits.
• **Isolation levels** — SQL defines four isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable) that trade off between performance and the degree of protection against phenomena like dirty reads, non-repeatable reads, and phantom reads.
• Without Isolation, two users updating the same bank account simultaneously could each read the original balance, add their deposits, and overwrite each other's changes — a classic 'lost update' problem.
• 💡 Option A (Data is kept in a single file) is wrong because file storage has nothing to do with transaction isolation; Option B (Deleting data from one table affects others) is wrong because that describes cascading deletes via foreign keys, not isolation; Option C (Data is only accessible by one user at a time) is wrong because isolation does not block simultaneous access — it controls visibility of intermediate states.

Question 9

What is a 'Candidate Key' in a database?

Accepted Answer

A set of attributes that can uniquely identify a tuple. • **Candidate Key** = any minimal set of one or more attributes that can uniquely identify every tuple in a relation. 'Minimal' means removing any attribute from the set would destroy its uniqueness property, so no redundant columns are included.
• **Relationship to Primary Key** — from all candidate keys in a table, the database designer selects one as the Primary Key; the remaining candidate keys become Alternate Keys. All candidate keys share the same two mandatory properties: uniqueness and non-nullability.
• A table can have multiple candidate keys; for example, an Employee table might have both Employee_ID and Email as separate candidate keys since both uniquely identify an employee.
• 💡 Option B (A key that can never be a primary key) is wrong because any candidate key is eligible to be chosen as the primary key; Option C (A key used for encryption) is wrong because encryption keys are a cryptography concept, not a database key type; Option D (A temporary key created during a search) is wrong because candidate keys are permanent schema-level attributes, not runtime constructs.

Question 10

Which database model represents data as a tree-like structure with parent-child relationships?

Accepted Answer

Hierarchical Model. • **Hierarchical Model** = a database model that organises data in a tree structure where each record (node) has exactly one parent but can have multiple children. The root node at the top has no parent, and every path from root to leaf represents a chain of parent-child relationships.
• **Historical significance** — IBM's Information Management System (IMS), introduced in the 1960s for the Apollo space programme, was one of the first commercial hierarchical databases and was widely used on mainframes for decades.
• The main limitation of the hierarchical model is that it cannot naturally represent many-to-many relationships; accessing a record always requires traversing from the root, which can be slow for deeply nested data.
• 💡 Option A (Relational Model) is wrong because it stores data in flat tables linked by keys, with no inherent tree structure; Option B (Network Model) is wrong because it allows many-to-many relationships using a graph structure where a child can have multiple parents; Option D (Object-Oriented Model) is wrong because it stores data as objects with attributes and methods, mirroring object-oriented programming.

DBMS Basics — Set 2