Database
I wrote my notes while studying for database design in markdown so I thought I might as well share it here.
Based on the slides publicly available from Database System Concepts Chapter 1-4
Database Notes
Chapter 1 - Intro to Databases
DBMS
- Set of programs to access data
- Environment both convenient and efficient to use
Drawbacks of filesystems as DB
- Data redundancy/duplication and inconsistency of info
- Difficult to access data
- Data isolation
- Data in multiple files and formats
- Integrity problems
- Constraints in code instead of being stated explicitly
- Hard to add new constraints or change existing ones
- No atomicity
- Partial updates may be carried out in event of failure
- Concurrent access by multiple users
- For performance
- Uncontrolled concurrent access leads to inconsistencies
- Security and access control issues
DBMS uses ACID principle
- Atomicity
- If one part of transaction fails, entire transaction should fail
- Consistency
- Any transaction can only change the database in allowed ways
- Isolation
- Concurrent transactions result in system state identical to if transactions were executed serially
- Durability
- Results of a transaction are stored permanently even in the event of power loss, crashes or errors
DBMS Levels of abstraction
- Physical level
- Describes how a record is stored on the physical disk
- Logical level
- Describes how data is stored in a database and the relationships between data
- SQL create table statement
- View level
- Applications hide details of data types
- Hides information for security purposes
Instances and Schemas
- Physical Data Independence
- Ability to modify the physical level of database design without changing the logical schema
- Applications depend on logical level of database design (schema)
Relational data model
- Tabular data with rows (records) and columns (fields)
- Data can have relationships with other tables’ data
Database language
- Data Defnition Language (DDL)
- Language for defining database schema
- Create
- Includes integrity constraints like primary key and referential intergrity
- Data Manipulation Language (DML)
- Language for manipulating data
- Insert / Delete / Update
- Procedural
- User specifies what data is required and how to get those data
- Non-procedural (Declarative)
- User specifies what data is required
- SQL
- Query Language
- Language for accessing data
- Select
- SQL has all three
Database Design
- If values in a field are being repeated many times, a new table should be created with a relation back to the original table
Chapter 2 - Intro to Relational Model
Relation
- Attributes / Fields / Columns
- Domains are the set of allowed values for each attribute
- null is a member of every domain
- Example of relation
- instructor = (ID, name, dept_name, salary)
Database
- Consists of multiple relations
- Information about an enterprise broken up into parts
- Bad design eg. univ = (instructor_ID, name, dept_name, salary, student_ID, …) results in:
- Repetition of information (eg. two students with the same instructor)
- Need for null values (eg. student with no advisor)
Keys
- K is a superkey of R if values of K can uniquely identify a tuple (record) of each possible relation r(R)
- eg. {ID} and {ID,name} are superkeys of instructor
- Superkey K is a candidate key if K is minimal
- eg. {ID} is a candidate key
- Candidate key chosen by database designer as principal means of identifying tuples within a relation is a primary key
- eg. {ID} chosen as primary key
- Foreign key constraint / integrity means that the value in one relation must appear in another as the referencing / referenced relation
Relational Query Language
- Selection of tuples / records / rows
- Select a subset of all rows
- Selection of columns / attributes / fields
- Select a subset of all columns
- Discard duplicate values
- Joining two relations - Cartesian Product
- For every row in r
- For every row in s
- Create new row in r x s with all columns from both relations
- For every row in s
- For every row in r
- Union of two relations
- Set difference of two relations
- r exclude s
- Set intersection of two relations
- Joining two relations - Natural Join
- For every row in s
- If row in s has values in the same columns as a row in s, create new row with matched columns and append other existing columns’ values
- For every row in s
Chapter 3 - SQL
SQL Data Definition Language (DDL)
Create Table
create table instructor(
ID char(5),
name varchar(20) not null,
dept_name varchar(20),
salary numeric(8,2),
primary key(ID),
foreign key(dept_name) references department);
not null
is an integrity constraint, name cannot take a null value
create table takes(
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4,0),
grade varchar(2),
primary key(ID, course_id, sec_id, semester, year),
foreign key(ID) references student,
foreign key(course_id, sec_id, semester, year) references section);
Since sec_id is in the primary key, a student can be registered for two sections of the same course in the same semester and year
Drop
drop table student
Deletes entire table and contents
Alter
alter table student add name varchar(20)
alter table student drop name
Dropping of attributes not usually supported
SQL Data Manipulation Language (DML)
Basic query structure
select [attribute 1], [attribute 2], ...
from [relation 1], [relation 2], ...
where [predicate]
Result of an SQL query is a relation
Select
select name from instructor
Lists the attributes desired in the result of a query, names are case-insensitive
select distinct dept_name from instructor
Removes duplicates; the keyword all
instead of distinct
will specify instead that duplicates not be removed
select * from instructor
*
denotes all attributes
select ID, name, salary/12 from instructor
Arithmetic expressions allowed in select
; in this case salary is divided by 12 when returned
Where
Specifies conditions the result must satisfy (selection predicate)
select name from instructor where dept_name = 'Comp. Sci.' and salary > 80000
Logical connectives and, or, not can be combined with comparisons
From
Lists relations involved in the query; creates a Cartesian product of all the relations involved
select * from instructor, teaches
Returns every possible instructor-teaches pair with all attributes from both relations
Joins
select section.course_id, semester, year, title from section, course where section.course_id = course.course_id and dept_name = 'Comp. Sci.'
Natural Joins
select * from instructor natural join teaches
Beware of unrelated attributes with the same name getting equated
Incorrect: select name, title from instructor natural join teaches natural join course
Correct: select name, title from (instructor natural join teaches) join course using (course_id)
Rename
select ID, name, salary monthly_salary from instructor
Refers to salary as monthly_salary instead
select distinct T.name from instructor T, instructor S where T.salary > S.salary and S.dept_name = 'Comp. Sci.'
Find names of all instructors who have higher salary than some instructor in ‘Comp. Sci.’; Self join to compare records in same table
Like
select name from instructor where name like '%da\%r_' escape '\'
% matches any substring, _ matches any character, escape
defines an escape character
Order by
select distinct name from instructor order by dept_name, name desc
Between
select name from instructor where salary between 90000 and 100000
Set operations
(select course_id from section where sem='Fall') union (select course_id from section where sem='Spring')
Set operations union
, intersect
and except
automatically eliminate duplicates, to retain them use union all
, intersect all
and except all
Null
select name from instructor where salary is null
null signifies an unknown value or a value that does not exist
Aggregate Functions
select avg(salary) from instructor where dept_name = 'Comp. Sci.'
Aggregate functions avg
, count
, min
, max
, sum
return a value; all aggregate functions exclude null values except count(*)
which counts the number of records in a table
select dept_name, avg(salary) from instructor group by dept_name
Resulting relation has columns dept_name and avg_salary with each row being the average salary of instructors for that department
select dept_name, avg(salary) from instructor group by dept_name having avg(salary) > 42000
Predicates in the having clause are applied after the formation of groups while predicates in the where clause are applied before forming groups
Nested Subqueries
Intersection
select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and
course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);
Exclusion
select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and
course_id not in (select course_id
from section
where semester = ’Spring’ and year= 2010);
Comparing with some (at least one) records in another query
select name
from instructor
where salary > some (select salary
from instructor
where dept_name = ’Biology’);
Comparing with all records in another query
select name
from instructor
where salary > all (select salary
from instructor
where dept_name = ’Biology’);
Check if subquery is nonempty
select course_id
from section as S
where semester = ’Fall’ and year= 2009 and
exists (select *
from section as T
where semester = ’Spring’ and year= 2010
and S.course_id= T.course_id);
Check if subquery is empty
select distinct S.ID, S.name
from student as S
where not exists ( (select course_id
from course
where dept_name = ’Biology’)
except
(select T.course_id
from takes as T
where S.ID = T.ID));
Tests for absence of duplicate tuples in result; returns true on empty set
select T.course_id
from course as T
where unique (select R.course_id
from section as R
where T.course_id = R.course_id
and R.year = 2009);
Subqueries in From clause
select dept_name, avg_salary
from (select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name)
where avg_salary > 42000;
With
Defines a temporary view (relation) only available to the query where with occurs; in this case, max_budget is the temporary table
with max_budget(value) as
(select max(budget)
from department)
select budget
from department, max_budget
where department.budget=max_budget.value;
Delete
delete from instructor where dept_name = 'Finance'
delete from instructor where salary > (select avg(salary) from instructor)
avg
is computed for every tuple, as delete
runs, avg
changes; compute avg
first then delete
Insert
insert into instructor values('10211', 'Smith', 'Biology', 66000);
Follows the order of fields declared in the create
statement
insert into instructor (ID, name, dept_name, salary) values('10211', 'Smith', 'Biology', 66000);
If cannot remember order of the attributes
insert into student select ID, name, dept_name, 0 from instructor
Inserts all instructors into the student relation with tot_creds set to 0
Update
update instructor set salary = salary * 1.03 where salary > 100000;
Increases instructors’ salaries whose salary is above 100000 by 1.03 times
update instructor
set salary = case
when salary <= 100000 then salary * 1.05
else salary * 1.03
end
Increases instructors’ salaries depending on their salary
Chapter 4 - Intermediate SQL
Outer join
Similar to join but uses null values for tuples that to not match tuples in the other relation; avoids loss of information
course left outer join prereq on course.course_id = prereq.course_id
course natural right outer join prereq
course natural full outer join prereq
Joined relations
- Join condition
- Defines which tuples in the two relations match and which attributes are in the result of the join
inner join
,left outer join
,right outer join
,full outer join
- Join type
- Defines how tuples in each relation that do not match any tuple in the other relation are treated
natural
,on <predicate\>
,using (attributes)
Views
create view v as <query expression>
View v is a virtual relation not a new relation, the expression is saved into view v and substituted into queries
create view faculty as
select ID, name, dept_name
from instructor
create view biology_faculty as
select ID, name
from faculty
where dept_name = 'Biology'
Views can be defined from other views
Integrity constraints
Unique
unique (attributes)
States that attributes form a candidate key
Check
check (predicate)
Ensures predicate is satisfied
create table section (
course_id varchar (8),
sec_id varchar (8),
semester varchar (6),
year numeric (4,0),
building varchar (15),
room_number varchar (7),
time slot id varchar (4),
primary key (course_id, sec_id, semester, year),
check (semester in (’Fall’, ’Winter’, ’Spring’, ’Summer’))
);
Referential integrity
- Ensures value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation
- eg. if “Biology” appears in one of the tuples in instructor, then there should exist a tuple in department for “Biology”
- dept_name is then a foreign key of instructor and a primary key of department
Cascade
create table course (
course_id varchar(8)
...
dept_name varchar(20),
foreign key (dept_name) references department (dept_name)
on delete cascade
on update cascade (or set null),
... );
Instead of simply rejecting a delete of Biology department, on delete cascade
deletes all courses the Biology department offers and on update cascade
updates fields accordingly; alternatives for cascade
are set null
and set default
Deferrable
create table person (
ID char(10),
name char(40),
mother char(10),
father char(10),
primary key ID,
foreign key father references person(ID) deferrable,
foreign key mother references person(ID) deferrable)
Defers constraint checking until the end of a transaction
Default
create table student (
ID varchar (5),
name varchar (20) not null,
dept_name varchar (20),
tot_cred numeric (3,0) default 0,
primary key (ID) )
Provides default value if no value is given
Index creation
create index studentID_index on student(ID)
Data structure used to speed up access to records, queries which specify a value for student.ID can now be executed by using the index without looking at all records of student
Built-in data types in SQL
- date
date '2015-04-01'
- time
time '17:00:30'
time '17:00:30.75'
- timestamp
timestamp '2015-04-01 17:00:30.75'
- interval
interval '1' day
interval '3-2' year to month
- Subtraction between above 3 types gives an
interval
interval
can be added to above 3 types
User-defined types
create type Dollars as numeric(12,2)
create table department
(dept_name varchar (20),
building varchar (15),
budget Dollars);
Domains
create domain degree_level varchar(10)
check(value in ('Bachelors', 'Masters', 'Doctorate'));
- Domains can have constraints such as
not null
but types cannot - Domains are not strongly typed and a value of one type can be transferred to one of another type
Large object types
- For pictures, videos, CAD files etc.
blob
: binary large object whose interpretation is left to an application outside of the database systemclob
: character large object is a large collection of character data
create table pictures(
name varchar(12),
pic blob(32MB));
select pic
from pictures
where name like '%SNU%';