4 - Datapalooza
ucla | CS 131 | 2023-10-25 14:39
Table of Contents
- Variable vs. Value
- Types
- Type Checking
- Scoping
- Memory Safety
- Garbage Collection
- Mutability
- Variable Binding Semantics
Variable vs. Value
What’s in a variable
What’s in a value
Types
- categorizes data, defines the size, encoding, operations, and casts i.e. how it’s handled
- it is not necessary that all variables in a typed language (like Python, C, etc.) have types
- in fact, no variables in Python have types
- OTOH in Haskell, even though variables are dynamically typed, bc they are immutable, when they are bound, they are given fixed types
- i.e. a var is a type if it is perpetually bound to a type of value
Primitives
- enum - an enumrated type with a set of possible values
Composites
- unions - can hold one of a few defined types
- generic types - templated or type parameters
- function types - functions stored as vars
- boxed types (mutable primitive) - an obj whose only value is a primitive
- user-defined types: the value the class defines (not the class itself), the value the struct defines, the value the enum defines
Type Checking
Strong vs Weak
- comparison
- guarantees operations invoked on appropriate types (doesn’t have to be the same types)
- guarantees no undefined behavior due to type-related issues: type and memory safe
- strong type checking
- ensures memory-safeness because it needs to check the type of the undefined pointer or array out of bounds to validate operations
- C/C++ are not strongly typed - they have undefined behavior for dangling pointers and out-of-bounds access
- Casting - uses checked casting to ensure casting to different values is ensured
- pros vs cons
Weak
- may cause undefined behavior due to type-related operations
- weak languages allow unchecked casts -> possible undefined behavior
Static vs Dynamic
- static, dynamic, also gradual and hybrid typing, covered later
Static
- even if the type is inferred, the value has a determined static non-mutable type, so it will check that all the params match the types specified
- for static typing, the var must have a fixed immutable type
- constraint satisfaction
- Examples
- Because static typing may need to infer, it is conservative and may prevent technically correct code from running if there is ambiguity in the types
- in the above, bite/scratch will only call for dog or scratch, but the fact that mammal has no bite/scratch will say it is not the correct type (at compile type)
- pros and cons
- types are associated with a value, not a variable; a variable is just a named binding
- uses secret type tags that are associated when a value is defined
- safety of operations is checked at run-time
- static may use dynamic during down-casting
- down casting - casting a superclass to a subclass (e.g. person -> student)
- pros and cons
- must support ducktyping
- iff it quacks, then it s a duck (not tat only ducks can quack, so all functions can use all functions and is only checked at run time)
- supporting operations
- some variables may be given explicit types while others are untyped
- type checking occurs partly before execution and partly during runtime
- gradual vs static vs dynamic
Language Examples
Type Conversion and Casts
- casting vs conversion
- conversion creates a new object with the converted value
- casting just treats the value as the new type without creating a new object
- implicit vs explicit cast/conversion
- whether you need to explicitly write an expression to cast to a new class
- upcast vs downcast
- upcast casts to the superclass e.g., Nerd -> Person
- upcasts may be implicit
- downcast is super to subclass e.g. Animal -> Mammal
- downcasts must be explicit
- explicit casting tells the compiler to allow a compiler error to become a runtime check
- coercions and promotions
- coercions and promotions are implicit conversions ONLY -> allows for operations even if types are not the same
- type promotions are conversions from a narrow type like
int
tofloat
which s a wider type bc it can hold more possible values
- coercions and promotions are implicit conversions ONLY -> allows for operations even if types are not the same
- widening vs narrowing conversions
- conversions can be narrowing or widening - narrowing or widening the range of possible values
- ints are narrower than doubles
- widening conversions are “value-preserving” since the values are guaranteed to be representable in the new conversion
- narrowing is from wider to narrower OR if the range of values does not perfectly overlap e.g. float -> int AND int->float are narrowing conversions and they not perfectly value-preserving in general
Casts
- conversions can be narrowing or widening - narrowing or widening the range of possible values
- widening vs narrowing casts
- upcasts are widening conversions and downcasts are narrowing
- because upcasts are guaranteed to work, they are always implicitly cast
- downcasts are always explicit since they are not always safe
Scoping
- a variable is in scope if it can be explicitly referenced in that region of code
- a var may be alive but still out of scope
- scoping is associated with the name of the variable, so scoping checks for values bound to specific names
- the variable name that is currently in scope is actively bound to its name; anything else is inactively bound
- the list of actively bound variables is collectively the lexical environment
- variable lifetimes
- value lifetimes
- static lifetimes - variable/value exists till end of calling function, not just the function enclosing block
- the usual scoping structure: sequentially nested
- Python LEGB
Dynamic Scoping
- check-in current blocks, then in the enclosing block
- then searches the calling function and recursively backward until the main
- then the global scope
- If it is not found anywhere after checking the calling function, then no such variable
- FOR PROJ2: consider passing available variables as parameters to the function
Memory Safety
- memory safety refers to whether or not the language allows for behavior that may cause undefined behavior
- Examples of memory-unsafe operations/langs
- Memory leaks are not necessarily memory-unsafe ONLY if it allows undefined behavior
- languages may run out of memory or not have garbage collection that leaves memory leaks -> but is still memory safe from an operations viewpoint
- Example of memory-safe langs/ops
- Memory management methods
- automated reclamation of heap allocated memory with no binding/reference
- a good rule is to garbage collect after each removal of reference (dangling)
- approaches
- impossible to predict when and if a given object will be freed
- determined by “memory pressure” -> sps you create files on disk -> no pressure on RAM -> no GC -> disk fills up before GC
Mark and Sweep Method
- two phases: Mark & Sweep
- goal: to discover all active objects
- consider an object in-use if 1 of 2 criteria:
Sweep Phase
- traverse all memory blocks in the heap
- check in-use flag -> purge & reset in use flags after we pass an active object
- purging/coalescing free space -> memory fragmentation -> no consecutive space to allocate new objects
Mark and Compact Method
- to mitigate memory fragmentation of mark and sweep
- mark phase is performed normally -> BUT no sweep
- collect all marked objects and move/compact to new contiguous block of memory
- then readjust pointers
- original block of allocated memory is now freed and can be used like normal
- cycle back and forth the locations of compacted data to ensure free space -> when we can no longer it means we are out of memory
Reference Counting Method
- every obj has a hidden count to track num references to it
- when ref count reaches 0, obj is deleted -> all objects transitively referenced by deleted obj must also be deleted e.g., properties of class obj
- to make this more efficient -> add ref_count = 0 objects to a list of pending to be destroyed -> destroy regularly over time
- can lead to memory fragmentation
Object End of Life
- objs may hold resources which need to be cleared
Destructors
- only in manual memory managed langs - C/C++
- deterministic rules for running
- programmer can determine when to release critical resources, close connections, delete files
Finalizers
- in GC langs, used to release unmanaged resources like connections, files when memory is reclaimed
- finalizer is not deterministic and may not even run
- last line of defense, may not even be used -> programmer should manually free unmanaged resources
Disposal Methods
- a func the programmer manually calls to free non memory resources
- used in GC langs bc finalizers are unpredictable
- guarantees release of unmanaged resources -> just dont forget to run
Mutability
- immutability is a property of a var/val/obj to make it read only
- instead of modifying we reconstruct a new obj
- immutability from lang features not hardware level walls
- can eliminate bugs, speed up GC, etc.
Immutability
- 4 approaches
Variable Binding Semantics
- binding semantics - ways langs associate var names with actual storage in RAM
- some langs directly associate, other use references with pointers
Value Semantics
- each name directly bound to the storage of the var
- var’s storage on the stack
Reference Semantics
- assign an alias name to an existing var and r/w through alias
- alias is directly bound to var’s storage
- hides its use of pointers
Object Reference Semantics
- var name is bound to pointer value that points to object value
- pointers are explicit and can be changed themselves -> assignment changes what the var points to not the value itself
- some langs like Java use diff semantics for primitives and object reference for user defined
Equality
- Object identity - do 2 object references refer to the same object at the same address: ids
- Object equality - do 2 obj refs refer to objs with equiv values (Even if at diff places in RAM)
Explicit pointer syntax
- usage of
*
and&
can be used to manually change pointer and pointed values - pointers are explicitly defined object references