4 - Datapalooza

ucla | CS 131 | 2023-10-25 14:39

Variable vs. Value

What’s in a variable

What’s in a value

Types

categorizes data, defines the size, encoding, operations, and casts i.e. how it’s handled
it is not necessary that all variables in a typed language (like Python, C, etc.) have types
- in fact, no variables in Python have types
- OTOH in Haskell, even though variables are dynamically typed, bc they are immutable, when they are bound, they are given fixed types
i.e. a var is a type if it is perpetually bound to a type of value
Primitives
enum - an enumrated type with a set of possible values
Composites
unions - can hold one of a few defined types
- Other Types
generic types - templated or type parameters
function types - functions stored as vars
boxed types (mutable primitive) - an obj whose only value is a primitive
user-defined types: the value the class defines (not the class itself), the value the struct defines, the value the enum defines
- interface - list function that should be implemented (but not actual implementation) defines a new type
- Value Types vs Reference Types

Type Checking

Strong vs Weak

comparison
- Strong
guarantees operations invoked on appropriate types (doesn’t have to be the same types)
guarantees no undefined behavior due to type-related issues: type and memory safe
strong type checking
ensures memory-safeness because it needs to check the type of the undefined pointer or array out of bounds to validate operations
- C/C++ are not strongly typed - they have undefined behavior for dangling pointers and out-of-bounds access
Casting - uses checked casting to ensure casting to different values is ensured
pros vs cons

Weak

may cause undefined behavior due to type-related operations
weak languages allow unchecked casts -> possible undefined behavior
Static vs Dynamic
static, dynamic, also gradual and hybrid typing, covered later
Static
even if the type is inferred, the value has a determined static non-mutable type, so it will check that all the params match the types specified
for static typing, the var must have a fixed immutable type
- constraint satisfaction
Examples
Because static typing may need to infer, it is conservative and may prevent technically correct code from running if there is ambiguity in the types
- in the above, bite/scratch will only call for dog or scratch, but the fact that mammal has no bite/scratch will say it is not the correct type (at compile type)
pros and cons
- Dynamic
types are associated with a value, not a variable; a variable is just a named binding
uses secret type tags that are associated when a value is defined
safety of operations is checked at run-time
static may use dynamic during down-casting
- down casting - casting a superclass to a subclass (e.g. person -> student)
pros and cons
must support ducktyping
- iff it quacks, then it s a duck (not tat only ducks can quack, so all functions can use all functions and is only checked at run time)
supporting operations
- iterable
- printable
- equality
  - Gradual Typing
some variables may be given explicit types while others are untyped
type checking occurs partly before execution and partly during runtime
gradual vs static vs dynamic
Language Examples

Type Conversion and Casts

casting vs conversion
- conversion creates a new object with the converted value
- casting just treats the value as the new type without creating a new object
implicit vs explicit cast/conversion
- whether you need to explicitly write an expression to cast to a new class
- upcast vs downcast
  - upcast casts to the superclass e.g., Nerd -> Person
  - upcasts may be implicit
  - downcast is super to subclass e.g. Animal -> Mammal
  - downcasts must be explicit
- explicit casting tells the compiler to allow a compiler error to become a runtime check
  - e.g., if we didn’t expollicitly cast the student to professor, do_your_thing would run a give_a_lec which would be undefined error since studnets cant give lectures
  - but explicit casting will check to throw an error if its not possible instead of proceeding
  - Conversions
coercions and promotions
- coercions and promotions are implicit conversions ONLY -> allows for operations even if types are not the same
- type promotions are conversions from a narrow type like int to float which s a wider type bc it can hold more possible values
widening vs narrowing conversions
- conversions can be narrowing or widening - narrowing or widening the range of possible values
  - ints are narrower than doubles
- widening conversions are “value-preserving” since the values are guaranteed to be representable in the new conversion
- narrowing is from wider to narrower OR if the range of values does not perfectly overlap e.g. float -> int AND int->float are narrowing conversions and they not perfectly value-preserving in general
  Casts
widening vs narrowing casts
- upcasts are widening conversions and downcasts are narrowing
- because upcasts are guaranteed to work, they are always implicitly cast
- downcasts are always explicit since they are not always safe

Scoping

a variable is in scope if it can be explicitly referenced in that region of code
a var may be alive but still out of scope
scoping is associated with the name of the variable, so scoping checks for values bound to specific names
the variable name that is currently in scope is actively bound to its name; anything else is inactively bound
the list of actively bound variables is collectively the lexical environment
- Lifetimes
variable lifetimes
value lifetimes
static lifetimes - variable/value exists till end of calling function, not just the function enclosing block
- Lexical Scoping
the usual scoping structure: sequentially nested
Python LEGB
Dynamic Scoping
check-in current blocks, then in the enclosing block
then searches the calling function and recursively backward until the main
then the global scope
If it is not found anywhere after checking the calling function, then no such variable
FOR PROJ2: consider passing available variables as parameters to the function
Memory Safety
memory safety refers to whether or not the language allows for behavior that may cause undefined behavior
Examples of memory-unsafe operations/langs
Memory leaks are not necessarily memory-unsafe ONLY if it allows undefined behavior
- languages may run out of memory or not have garbage collection that leaves memory leaks -> but is still memory safe from an operations viewpoint
Example of memory-safe langs/ops
Memory management methods
- Garbage Collection
automated reclamation of heap allocated memory with no binding/reference
a good rule is to garbage collect after each removal of reference (dangling)
approaches
- Unpredictability of GC
impossible to predict when and if a given object will be freed
determined by “memory pressure” -> sps you create files on disk -> no pressure on RAM -> no GC -> disk fills up before GC
Mark and Sweep Method
two phases: Mark & Sweep
- Mark Phase
goal: to discover all active objects
consider an object in-use if 1 of 2 criteria:
Sweep Phase
traverse all memory blocks in the heap
check in-use flag -> purge & reset in use flags after we pass an active object
purging/coalescing free space -> memory fragmentation -> no consecutive space to allocate new objects

Mark and Compact Method

to mitigate memory fragmentation of mark and sweep
mark phase is performed normally -> BUT no sweep
collect all marked objects and move/compact to new contiguous block of memory
then readjust pointers
original block of allocated memory is now freed and can be used like normal
cycle back and forth the locations of compacted data to ensure free space -> when we can no longer it means we are out of memory

Reference Counting Method

every obj has a hidden count to track num references to it
when ref count reaches 0, obj is deleted -> all objects transitively referenced by deleted obj must also be deleted e.g., properties of class obj
to make this more efficient -> add ref_count = 0 objects to a list of pending to be destroyed -> destroy regularly over time
can lead to memory fragmentation

Object End of Life

objs may hold resources which need to be cleared
Destructors
only in manual memory managed langs - C/C++
deterministic rules for running
programmer can determine when to release critical resources, close connections, delete files

Finalizers

in GC langs, used to release unmanaged resources like connections, files when memory is reclaimed
finalizer is not deterministic and may not even run
last line of defense, may not even be used -> programmer should manually free unmanaged resources

Disposal Methods

a func the programmer manually calls to free non memory resources
used in GC langs bc finalizers are unpredictable
guarantees release of unmanaged resources -> just dont forget to run

Mutability

immutability is a property of a var/val/obj to make it read only
instead of modifying we reconstruct a new obj
immutability from lang features not hardware level walls
can eliminate bugs, speed up GC, etc.
Immutability
4 approaches

Variable Binding Semantics

binding semantics - ways langs associate var names with actual storage in RAM
some langs directly associate, other use references with pointers
Value Semantics
each name directly bound to the storage of the var
var’s storage on the stack
Reference Semantics
assign an alias name to an existing var and r/w through alias
alias is directly bound to var’s storage
hides its use of pointers
Object Reference Semantics
var name is bound to pointer value that points to object value
pointers are explicit and can be changed themselves -> assignment changes what the var points to not the value itself
some langs like Java use diff semantics for primitives and object reference for user defined
Equality
Object identity - do 2 object references refer to the same object at the same address: ids
Object equality - do 2 obj refs refer to objs with equiv values (Even if at diff places in RAM)

Explicit pointer syntax

usage of * and & can be used to manually change pointer and pointed values
pointers are explicitly defined object references

Name Semantics

bind var names to an expr graph -> eval graph -> final value of var
expr graph is lazily evaluated when needed
Need Semantics
very similar to name semantics, but lang memoizes (caches) result of expr evaluation to elim unnecessary computations

4 - Datapalooza

Table of Contents

Variable vs. Value

What’s in a variable

What’s in a value

Types

Primitives

Composites

Other Types

Value Types vs Reference Types

Type Checking

Strong vs Weak

Strong

Weak

Static vs Dynamic

Static

Dynamic

Gradual Typing

Language Examples

Type Conversion and Casts

Conversions

Casts

Scoping

Lifetimes

Lexical Scoping

Dynamic Scoping

Memory Safety

Garbage Collection

Unpredictability of GC

Mark and Sweep Method

Mark Phase

Sweep Phase

Mark and Compact Method

Reference Counting Method

Object End of Life

Destructors

Finalizers

Disposal Methods

Mutability

Immutability

Variable Binding Semantics

Value Semantics

Reference Semantics

Object Reference Semantics

Equality

Explicit pointer syntax

Name Semantics

Need Semantics