CHAPTER 5

DATA TYPES

INTRO

The two most common structured data types are arrays and records.
These and other data types are specified by type operators, or constructors, which are used to perform expressions.
It is logical and correct to think of variables in terms of descriptors.
A descriptor is the collection of the attributes of a variable.
In an implementation, a descriptor is a collection of memory cells that store variable attributes.
The work "object is often associated with the value of a variable and the space it occupies

PRIMITIVE DATA TYPES

Data types that are not required in terms of other types are called primitive data types
The primitive data types of a language are used to, along with one or more type constructors, to provide the structured types.

NUMERIC TYPES

INTEGER

The most common primitive numeric data type is integer.
Many computers support several sizes of integers, and these capabilities are reflected in some programming languages
For example, Ada allows these: short integer, integer and long integer.
An integer is represented by a string of bits, with the leftmost representing the sign bit.

FLOATING-POINT

Floating point data types model real numbers, but the representations are only approximations for most real values.
On most computers, floating-point numbers are stored in binary, which exacerbates the problem
Floating-point values are represented as fractions and exponents
Most new computers use the standard IEEE format
Most languages use float and double as floating-point types
The float is stored in 4 bytes of memory
The double has twice as big of storage

DECIMAL

Most larger computers that are designed to support business systems applications have hardware support for decimal data types
Decimal data types store a fixed number of decimal digits, with the decimal point at a fixed position in the value - these are essential to COBOL
Decimal types have advantage of precisely storing decimal values, the disadvantages of decimal types are that the range of values is restricted because no exponents are allowed, and their representation in memory is wasteful
Decimal types are stored like character strings.

BOOLEAN TYPES

These are the simplest of all types and were introduced in ALGOL 60
The range of values has only two elements - TRUE or FALSE
Boolean types are often used to represent switches or flags in programs

CHARACTER TYPES

These are stored as numeric coding
The most commonly used coding is ASCII
A new 16-bit character set named Unicode had been developed as an alternative
Java is the first to use Unicode

CHARACTER STRING TYPES

A character string type is one in which the values consist of sequences of characters.
They are used to label output, and input and output of all kinds.

DESIGN ISSUES

Should strings be a special kink of character array or a primitive type?
Should strings have static or dynamic length?

STRINGS AND THEIR OPERATIONS

String data is stored in arrays of single characters and referenced as such in a language.
In Ada string is a type that is predefined to be single-dimensioned arrays of character elements.
Character string catenation in Ada is an operation specified by the “&”

Ex. name1 := name1 & name2;

C and C++ use char arrays to store character strings
Some of the most commonly used library functions for character strings in C and C++ are srtcpy which moves strings; strcat, which catenates one given string onto another; strcmp, which compares by order

STRING LENGTH OPTION

There are several design choices regarding the length of string values.
First, the length can be static and specified in the declaration – such a string is called static length string
The second option is to allow strings to have varying length up to a declared and fixed max set by variable’s definition – these are called limited dynamic length strings
The third option is to allow strings to have varying length with no max – these are called dynamic length strings

USER DEFINED ORDINAL TYPES

An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers

ENUMERATION TYPES

An enumeration type in one in which all of the possible values, which become symbolic constants, are enumerated in the definition.
Ex. in Ada: type DAYS is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
Enumeration types have advantages to readability and reliability.

SUBRANGE TYPES

A sub range type is a contiguous subsequence of an ordinal type.
Ex, 12 ... 14 is a sub range of integer type.

ARRAY TYPES

An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element
Specific elements of an array are referenced by means of a two-level syntactic mechanism, where the first part is the aggregate name, and the second part is a possibly dynamic selector consisting of one or more items known as subscripts or indexes.
If all of the indexes in a reference are constants, the selector is static, otherwise it is dynamic
A static array is one in which the subscript ranges are statically bound and storage allocation is static (done before run time), advantage is efficiency
A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but the allocation is done at declaration elaboration time during execution – advantage is space efficiency
Static-dynamic array is one in which the subscript ranges are dynamically bound and the storage allocation is dynamic – the advantage here is flexibility
Heap-dynamic array is one in which the binding of subscript ranges and allocation is dynamic and can change any number of times during the array’s lifetime – the advantage here is flexibility
Arrays in C can have only one subscript, but arrays can have arrays as elements, thus supporting multi-dimensional arrays. This is an example of orthogonality
A slice of an array is some substructure of that array
There are 2 common ways in which multi-dimensional arrays can be mapped to one dimension in memory: row major order and column major order.
Row major order the array is stored by rows
Column major order the array is stored by columns

ASSOCIATIVE ARRAYS

An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys – implemented by Perl

RECORD TYPES

A record is a heterogeneous aggregate of data elements in which the individual elements are identified by names
Records and arrays are closely related and are interesting to compare them. Arrays are used when all the data values have the same type and are processed in the same way. Records are used when the collection of data values is heterogeneous and different fields are not processed in the same way. Also the fields of a record often need not be processed in a particular sequential order.

UNION TYPES

A union is a type that may store different type values at different times during program execution
Fortran, C and C++ provide union constructs

SET TYPES

A set type is one whose variables can store unordered collections of distinct values from some ordinal type called its base type. Set types are often used to model mathematical sets.

POINTER TYPES

A pointer type is one in which the variables have a range of values that consists of memory addresses and a special value, nil.
Pointers have been designed for 2 uses: One, pointers provide some of the power of indirect addressing, which is heavily used in Assembly. Two, pointers provide a method of dynamic storage management. A pointer can be used to access a location in the area where storage is dynamically allocated, which is usually called a heap.
Variables that are dynamically allocated from the heap are called heap-dynamic variables.
2 pointer operations provided by pointers are assignment and dereferencing
A dangling pointer is a pointer that contains the address of a heap-dynamic variable that has been deallocated.