06. Mathematical coprocessor

Base lecture in Russian

Real numbers

The're no such object IRL
- ⇒ Always model
- ⇒ No float1 == float2 (only by coincidence)

Representation

Fixed point: 1231234.21341234
- Trailing/leading zeros: 1230000000.0/0.0000000123
Floating point: 1.213123·10²³ = 1.213123E23 = 1.213123e+23
- Normalization $$1<=mantissa<10$$
- Accidental zeros: 712E+8 + 3e-5 = 71200000000.00003

Real numbers modelling

Fixed point: very small range
Lexical (string/remainder based): too slow/complex, although perfect

⇒ float point.

Binary fixed point: use 2^-1, 2^-2, 2^-3 etc.
155.625 =
- = 1·2⁷ +0·2⁶+0·2⁵+1·2⁴+1·2³+0·2²+1·2¹+1·2⁰+1·2^-1+0·2^-2+1·2^-3 =
- = 128 + 0 + 0 + 16 + 8 + 0 + 2 + 1 + 0,5 + 0 + 0,125 155.625₁₀ =
- = 10011011.101₂
155.625 = 1.55625·exp₁₀⁺² = 10011011.101₂·exp₂⁰ = 1.0011011101·exp₂⁺¹¹¹ (111₂ = 7₁₀)

IEEE_754

But no, they thought they're all smartasses.

IEEE_754

S[  E   ][         M           ]
01110101111100010110101110101111

S - sign bit
E - biased exponent; 8 bits for 32-bit float
- E = exponent +127 for 32-bit float
M - remainder of mantissa (23 bits for 32-bit float)
- 2²³=8388608 , if mantissa > 2²³, it will loose lower digits
- 2-normalized float: $$1<= mantissa <2$$
  - ⇒ mantissa always starts from 1, do not store it
- 2-denormalized float ($$n<=0.5$$): $$0.5<=mantissa<1$$
  - ⇒ mantissa always starts from 0, do not store it

Number	31 bit	30-22 bit	22-0 bit	Hexadecimal
	Sign	Biased exponent	Mantissa
155.625 (normalized)	0	10000110	00110111010000000000000	431BA000
-5.23E-39 (denormalized)	1	00000000	01110001111001100011101	8038f31d

Signed like integer
Zero like integer (hence exponent bias)
Double float: 1-bit Sign, 11-bit Exponent, 52-bit Mantissa
MARS: «Tools → Floating Point Representation»
IEEE 754 is mathematically and practically awful! longread in Russian
- ⇒ NaN, Inf etc.

FPU / C1

The concept of coprocessor: orthogonal task, data formats, performance, data flow
C0 — control coprocessor (later)
FPU MIPS:
1. IEEE 754 /32 /64
2. 32 dedicated C1 f-registers
3. =16 d-registers $f0~$f, $f2~$f3 etc., so only $f0, $f2, $f4 ... can be used

Instruction set

Memory:

op

cop

ft

fs

fd

funct

6bits

5bits

5bits

5bits

5bits

6bits
- op = 17
- cop = 16 for 32-bit and 17 for 654-bit
- fTarget, fSource, fDestination — f-registers
- funct — extension
Assembler:
- command.type $f_destination $f_source $f_target
  - command: add, sub, div, mul
  - type: s or d
```
   mul.s $f1 $f2 $f8
   add.d $f0 $f0 $f2
```
- command.type $f_destination $f_source
  - command: neg, abs, mov, sqrt, movf, movt
    - movf/movt — conditional move
```
   mov.s $f4 $f7
   sqrt.d $f0 $f4
```
- memory: command.type f-register offset(comon-register)
  - l/s (load/store), s/d
```
   l.s $f1 40($t4)
   s.d $f6 ($t5)
```
- registers: command.type common-register f-register
  - mfc1/mtc1 (move from/to C1) , s/d
  - double use 2 common registers (e. g. $t0~$t)
```
   mtc1.s $t1 $f3
   mfc1.d $t2 $f4
```
- float/int conversion: command.type.type f-destination f-source
  - cvt/floor/trunc/round, s/d/w (word, i. e. integer)
  - use f-register only (why ?)
```
   cvt.w.s $f1 $f1
   floor.w.d $f2 $f4
```

More complex instructionx

Non-atomic conditional jumps
- comparison: c.le/lt/eq.s/d $f_{source.. $f}target
  - store 1/0 into C1 flag (#0, but there's others, like c.le.s 1 $f0 $f1)
  - ge/gt is reversed lt/le
- jump: bc1t/bc1f label
  - jump if C0 flag 0 is 1/0 (similarly bc1t 1 label for C0 flag 1)
```
   c.le.s $f0 $f1
   bc1t less
```
- Conditional moves:
  - movt/movf r_destination r_source — move conditional register if C1 flag 0 is True/False (also movt $t0 $t1 2)
  - movt/movf.type f_destination f_source (+optinoal flag number) — for f-registers
```
   c.le.s $f0 $f1
   movt $t4 $t3
   movt.s $f1 $f0
```
- Also, common register conditional commands!:
  - slt r_dest r_source r_target (set r_dest to 1/0 if r_source is less then/(not) r_target); used in pseudoinstruction like blt $t0 $t1 label
  - movz/movn r_dest r_source r_target (set r_dest to r_source if r_target is zero/nonsero)
  - movz/movn .s//d f_dest f_source r_target (set f_dest to f_source if r_target is zero/nonsero)

Examples

Calculate a square root from integer

```
   1 .data
   2 src:    .word   100
   3 dst:    .float  0
   4 idst:   .word   0
   5 .text
   6         lw      $t0 src         # source integer
   7         mtc1    $t0 $f2         # store to FPU
   8         cvt.s.w $f2 $f2         # convert to single-sized float
   9         mtc1    $zero $f0       # zero in $f0 (non need to convert)
  10         c.lt.s  $f2 $f0         # check if <0 …
  11         bc1t    nosqrt          # no root then
  12         sqrt.s  $f2 $f2
  13 nosqrt: s.s     $f2 dst         # store float result
  14         cvt.w.s $f2 $f2         # convert to integer
  15         mfc1    $t0 $f2         # get from FPU
  16         sw      $t0 idst        # store integer result
```

Caution: lt vs. 1t sucks

Caclulate $$e$$ as infinite sum of $$sum_(n=1)^infty 1/(n!)$$

   1 .data
   2 one:    .double 1
   3 ten:    .double 10
   4 .text
   5         l.d     $f2 one         # 1
   6         sub.d   $f4 $f4 $f4     # n
   7         mov.d   $f6 $f2         # n!
   8         mov.d   $f8 $f2         # here will be e
   9         l.d     $f10 ten        # here will be ε
  10         mov.d   $f0 $f2         # decimal length K
  11         li      $v0 5
  12         syscall
  13 
  14 enext:  blez    $v0 edone       # 10**(K+1)
  15         mul.d   $f0 $f0 $f10
  16         subi    $v0 $v0 1
  17         b       enext
  18 edone:  div.d   $f10 $f2 $f0    # ε
  19 
  20 loop:   add.d   $f4 $f4 $f2     # n=n+1
  21         mul.d   $f6 $f6 $f4     # n!=(n-1)!*n
  22         div.d   $f0 $f2 $f6     # next summand
  23         add.d   $f8 $f8 $f0
  24         c.lt.d  $f0 $f10        # next summand < ε
  25         bc1f    loop
  26 
  27         li      $v0 3           # output a double
  28         mov.d   $f12 $f8        # $f12 by syscall standard
  29         syscall

H/W

EJudge: CubicRoot 'Cubical root'

Input double (positive or negative) float $$1<=|A|<=1000000$$ and $$0.00001<=varepsilon<=0.01$$. Calculate a cubical root of A with closeness $$<=varepsilon$$ (you do not need to round the result). HINT: you always can calculate a cubic power of something!
Input:
```
1000
0.0001
```
Output:
```
9.99995
```
EJudge: FractionTruncate 'Inexact fraction'

Input three cardinals — A, B and n. Output double float F that has exact n decimal places of A/B. You need to write a subroutine than accepts double f=A/B in $f12 and integer n in $a0 and returns rounded double F in $f0. Hint: $$10^n*A/B < 2^31$$
Input:
```
123
456
7
```
Output:
```
0.2697368
```
EJudge: LeibPi 'Caclulating Pi'

Calculate π value using Leibniz_formula_for_π accurate to N decimal places. Input N, output the result. Use function defied in ../Homework_FractionTruncate to truncate out other digits. Keep in mind that the exact formula is calculating π/4, you probably should start with 4 instead 1 to gain exact accuracy. Warning: the algorithm is slow, do not panic, but keep code as simple as possible.
Input:
```
4
```
Output:
```
3.1416
```

HSE/ArchitectureASM/06_MathCoprocessor (последним исправлял пользователь FrBrGeorge 2019-11-30 23:34:06)

op	cop	ft	fs	fd	funct
6bits	5bits	5bits	5bits	5bits	6bits