Cluster table
Introduction
A cluster is a
schema object that contains one or more
tables that all have one or more columns in
common. Rows of one or more tables that
share the same value in these common columns
are physically stored together within the
database.
Generally, you
should only cluster tables that are
frequently joined on the cluster key columns
in SQL statements. Clustering multiple
tables improves the performance of joins,
but it is likely to reduce the performance
of full table scans, INSERT statements, and
UPDATE statements that modify cluster key
values.
Cluster Keys
The columns
defined by the CREATE CLUSTER command make
up the cluster key. These cluster columns
must correspond in both datatype and size to
columns in each of the clustered tables,
although they need not correspond in name.
Types of
Clusters
A cluster can
be either an indexed cluster or a hash
cluster.
Indexed
Clusters
In an indexed
cluster, Oracle stores together rows having
the same cluster key value. Each distinct
cluster key value is stored only once in
each data block, regardless of the number of
tables and rows in which it occurs. This
saves disk space and improves performance
for many operations.
You may want
to use indexed clusters in the following
cases:
Your queries
retrieve rows over a range of cluster key
values.
Your clustered
tables may grow unpredictably.
After you
create an indexed cluster, you must create
an index on the cluster key before you can
issue any data manipulation language (DML)
statements against a table in the cluster.
This index is called the cluster index.
A cluster
index provides quick access to rows within a
cluster based on the cluster key. If you
issue a SQL statement that searches for a
row in the cluster based on its cluster key
value, Oracle searches the cluster index for
the cluster key value and then locates the
row in the cluster based on its ROWID.
Hash Clusters
In a hash
cluster, Oracle stores together rows that
have the same hash key value. The hash value
for a row is the value returned by the
cluster's hash function. When you create a
hash cluster, you can either specify a hash
function or use the Oracle internal hash
function. Hash values are not actually
stored in the cluster, although cluster key
values are stored for every row in the
cluster.
Cases to use
Hash Clusters
You may want
to use hash clusters in the following cases:
1- Your
queries retrieve rows based on equality
conditions involving all cluster key
columns.
2- Your
clustered tables are static or you can
determine the maximum number of rows and the
maximum amount of space required by the
cluster when you create the cluster.
The hash
function provides access to rows in the
table based on the cluster key value. If you
issue a SQL statement that locates a row in
the cluster based on its cluster key value,
Oracle applies the hash function to the
given cluster key value and uses the
resulting hash value to locate the matching
rows. Because multiple cluster key values
can map to the same hash value, Oracle must
also check the row's cluster key value. This
process often results in less I/O than the
process for the indexed cluster, because the
index search is not required.
Oracle's
internal hash function returns values
ranging from 0 to the value of HASHKEYS - 1.
If you specify a column with the HASH IS
clause, the column values need not fall into
this range. Oracle divides the column value
by the HASHKEYS value and uses the remainder
as the hash value. The hash value for null
is HASHKEYS - 1. Oracle also rounds the
HASHKEYS value up to the nearest prime
number to obtain the actual number of hash
values. This rounding reduces the likelihood
of hash collisions, or multiple cluster key
values having the same hash value. You
cannot create a cluster index for a hash
cluster, and you need not create an index on
a hash cluster key. If you cannot fit all
rows for one hash value into a data block,
do not use hash clusters. Performance is
very poor in this circumstance because an
insert or update of a row in a hash cluster
with a size exceeding the data block size
fills the block and performs row chaining to
contain the rest of the row.
Cluster Size
Oracle uses
the value of the SIZE parameter to determine
the space reserved for rows corresponding to
one cluster key value or one hash value.
This space then determines the maximum
number of cluster or hash values stored in a
data block. If the SIZE value is not a
divisor of the data block size, Oracle uses
the next largest divisor. If the SIZE value
is larger than the data block size, Oracle
uses the operating system block size,
reserving at least one data block per
cluster or hash value. Oracle also considers
the length of the cluster key when
determining how much space to reserve for
the rows having a cluster key value. Larger
cluster keys require larger sizes. To see
the actual size, query the KEY_SIZE column
of the USER_CLUSTERS data dictionary view.
This does not apply to hash clusters because
hash values are not actually stored in the
cluster. Although the maximum number of
cluster and hash key values per data block
is fixed on a per `-cluster basis, Oracle
does not reserve an equal amount of space
for each cluster or hash key value. Varying
this space stores data more efficiently,
because the data stored per cluster or hash
key value is rarely fixed. A SIZE value
smaller than the space needed by the average
cluster or hash key value may require the
data for one cluster key or hash key value
to occupy multiple data blocks. A SIZE value
much larger results in wasted space.
When you
create a hash cluster, Oracle immediately
allocates space for the cluster based on the
values of the SIZE and HASHKEYS parameters.
Adding Tables
to a Cluster
You can add
tables to an existing cluster by issuing a
CREATE TABLE statement with the CLUSTER
clause. A cluster can contain as many as 32
tables, although the performance gains of
clustering are often lost in clusters of
more than four or five tables.
All tables in
the cluster have the cluster's storage
characteristics as specified by the PCTUSED,
PCTFREE, INITRANS, MAXTRANS, TABLESPACE, and
STORAGE parameters.
Example I
The following
statement creates an indexed cluster named
PERSONNEL with the cluster key column
DEPARTMENT_NUMBER, a cluster size of 512
bytes, and storage parameter values:
SQL> CREATE
CLUSTER personnel
(
department_number NUMBER(2) )
SIZE 512
STORAGE
(INITIAL 100K NEXT 50K PCTINCREASE 10);
The following
statements add the EMP and DEPT tables to
the cluster:
SQL> CREATE
TABLE emp
(empno NUMBER
PRIMARY KEY,
ename
VARCHAR2(10) NOT NULL
CHECK (ename =
UPPER(ename)),
job
VARCHAR2(9),
mgr NUMBER
REFERENCES scott.emp(empno),
hiredate DATE
CHECK (hiredate >= SYSDATE),
sal
NUMBER(10,2) CHECK (sal > 500),
comm
NUMBER(9,0) DEFAULT NULL,
deptno
NUMBER(2) NOT NULL )
CLUSTER
personnel (deptno);
SQL> CREATE
TABLE dept
(deptno
NUMBER(2),
dname
VARCHAR2(9),
loc
VARCHAR2(9))
CLUSTER
personnel (deptno);
The following
statement creates the cluster index on the
cluster key of PERSONNEL:
SQL> CREATE INDEX idx_personnel ON CLUSTER
personnel;
After creating
the cluster index, you can insert rows into
either the EMP or DEPT tables.
Example II
The following
statement creates a hash cluster named
PERSONNEL with the cluster key column
DEPARTMENT_NUMBER, a maximum of 503 hash key
values, each of size 512 bytes, and storage
parameter values:
SQL> CREATE
CLUSTER personnel
(
department_number NUMBER )
SIZE 512
HASHKEYS 500
STORAGE
(INITIAL 100K
NEXT 50K
PCTINCREASE
10);
Because the
above statement omits the HASH IS clause,
Oracle uses the internal hash function for
the cluster.
Example III
The following
statement creates a hash cluster named
PERSONNEL with the cluster key made up of
the columns HOME_AREA_CODE and HOME_PREFIX,
and uses a SQL expression containing these
columns for the hash function:
SQL> CREATE
CLUSTER personnel
(
home_area_code NUMBER,
home_prefix
NUMBER )
HASHKEYS 20
HASH IS
MOD(home_area_code + home_prefix, 101);
Questions:
Q: What is a
cluster table in the Oracle database?
Q: What is a
cluster key?
Q: What are
the types of clusters?
Q: Describe an
indexed cluster?
Q: Describe an
hash cluster?
Q: When do you
use a hash cluster?
Q: What is the
Cluster Size parameter?
Q: How do you
add a table to a cluster table?
Q: What do the
following SQL statements do?
SQL> CREATE
CLUSTER personnel
(
department_number NUMBER(2) )
SIZE 512
STORAGE
(INITIAL 100K NEXT 50K PCTINCREASE 10)
/
SQL> CREATE
TABLE emp
(empno NUMBER
PRIMARY KEY,
ename
VARCHAR2(10) NOT NULL
CHECK (ename =
UPPER(ename)),
job
VARCHAR2(9),
mgr NUMBER
REFERENCES scott.emp(empno),
hiredate DATE
CHECK (hiredate >= SYSDATE),
sal
NUMBER(10,2) CHECK (sal > 500),
comm
NUMBER(9,0) DEFAULT NULL,
deptno
NUMBER(2) NOT NULL )
CLUSTER
personnel (deptno)
/
SQL> CREATE
TABLE dept
(deptno
NUMBER(2),
dname
VARCHAR2(9),
loc
VARCHAR2(9))
CLUSTER
personnel (deptno)
/
SQL> CREATE
INDEX idx_personnel ON CLUSTER personnel
/
SQL> CREATE
CLUSTER personnel
(
department_number NUMBER )
SIZE 512
HASHKEYS 500
STORAGE
(INITIAL 100K
NEXT 50K
PCTINCREASE
10)
/ |