NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-sparc64/57472: OpenSSL broken, affects ecdsa keys in OpenSSH



The following reply was made to PR port-sparc64/57472; it has been noted by GNATS.

From: Martin Husemann <martin%duskware.de@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: port-sparc64/57472: OpenSSL broken, affects ecdsa keys in OpenSSH
Date: Mon, 26 Jun 2023 14:03:40 +0200

 So after more offlist discussion and further digging it turned out that
 we had (and still have on the branches) for the old openssl a
 configuration with 64bit BIGNUM limbs, but nearly all asm optimizations
 disabled.
 
 This is a "bn(64/64)" configuration in openssl terms. And upstream decided
 to switch to a configuration with 32bit limbs instead, using a "bn(64/32)"
 configuration - which results in a different internal binary storage layout
 for the BN limbs. The provided asm code only can deal with this layout
 (though it should be possible to swap some offsets conditionally around
 and use the 64bit layout w/o modifying the asm big time).
 
 Now after the import we initially tried to make the bn(64/64) configuration
 work, but with some asm optimizations disabled.
 
 But it sounds a lot easier to follow upstream native configuration and go
 for bn(64/32) and enable all asm optimizations.
 
 I did a run of openssl (on the same machine) with our own build of the old
 openssl in netbsd-10, then with basically what Harold suggested, and again
 with changes to get as close as possible to the native build.
 
 The numbers are:
 
 old openssl netbsd-10:
 
 OpenSSL 1.1.1t  7 Feb 2023
 NetBSD 10.0_BETA
 options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) 
 gcc version 10.4.0 (NetBSD nb1 20220722) 
 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
 md2                  0.00         0.00         0.00         0.00         0.00         0.00 
 mdc2              2492.49k     3015.55k     3198.10k     3245.16k     3257.75k     3265.91k
 md4               6983.05k    24674.74k    68966.74k   125959.14k   165998.22k   170023.46k
 md5              13957.66k    37116.56k    72801.89k    96587.01k   106395.30k   107209.06k
 hmac(md5)         4430.39k    15330.32k    43019.14k    78563.25k   103336.23k   105701.29k
 sha1             12664.84k    32845.61k    61806.65k    79300.81k    86451.46k    87020.27k
 rmd160            4607.58k    13274.13k    29076.75k    41374.02k    47230.55k    47720.44k
 rc4              63349.01k    68325.00k    70515.41k    70920.33k    70407.65k    70418.54k
 des cbc          21293.41k    23008.66k    23708.41k    23883.69k    23996.30k    23999.02k
 des ede3          8692.32k     9091.81k     9223.99k     9255.12k     9272.47k     9275.19k
 idea cbc         15269.34k    16126.70k    16401.83k    16440.13k    16424.82k    16433.15k
 seed cbc         18933.80k    21739.74k    22537.02k    22789.95k    22962.26k    22834.18k
 rc2 cbc          13707.22k    14445.48k    14624.40k    14740.16k    14780.98k    14789.15k
 rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00         0.00 
 blowfish cbc     27367.95k    30036.67k    30749.09k    31055.53k    30832.95k    30911.87k
 cast cbc         23491.15k    25366.61k    25922.17k    26099.75k    26146.36k    26154.52k
 aes-128 cbc      28434.69k    36174.50k    38959.12k    39776.45k    39699.90k    39718.95k
 aes-192 cbc      25592.97k    31446.71k    33552.24k    34173.70k    34117.91k    34128.80k
 aes-256 cbc      23207.50k    27877.42k    29491.71k    29953.53k    29910.33k    29921.21k
 camellia-128 cbc    23668.30k    28328.74k    29906.41k    30311.42k    30451.92k    30438.31k
 camellia-192 cbc    19201.73k    22705.03k    23648.53k    23913.63k    23990.86k    23999.02k
 camellia-256 cbc    19438.93k    22695.00k    23655.93k    23918.05k    23993.58k    23999.02k
 sha256            4412.71k    10114.87k    17312.83k    21063.44k    22513.03k    22638.22k
 sha512            3023.19k    12197.95k    19403.78k    27490.49k    31292.90k    31619.49k
 whirlpool         3490.62k     7620.70k    12841.93k    15530.78k    16520.08k    16596.28k
 aes-128 ige      24407.77k    33821.09k    37521.35k    38610.24k    38662.97k    38701.08k
 aes-192 ige      22068.24k    29630.53k    32502.81k    33294.97k    33331.37k    33383.08k
 aes-256 ige      20324.01k    26430.92k    28664.60k    29281.98k    29311.57k    29349.68k
 ghash            53902.86k    67752.53k    72022.75k    73597.70k    74051.87k    74098.14k
 rand               663.23k     2518.71k     7777.26k    17577.96k    26973.72k    28285.69k
                   sign    verify    sign/s verify/s
 rsa  512 bits 0.002195s 0.000186s    455.6   5384.1
 rsa 1024 bits 0.012940s 0.000632s     77.3   1583.5
 rsa 2048 bits 0.085085s 0.002292s     11.8    436.2
 rsa 3072 bits 0.283333s 0.005543s      3.5    180.4
 rsa 4096 bits 0.580556s 0.008401s      1.7    119.0
 rsa 7680 bits 3.960000s 0.033003s      0.3     30.3
 rsa 15360 bits 30.500000s 0.129359s      0.0      7.7
                   sign    verify    sign/s verify/s
 dsa  512 bits 0.003202s 0.002552s    312.3    391.8
 dsa 1024 bits 0.009025s 0.008096s    110.8    123.5
 dsa 2048 bits 0.030151s 0.028466s     33.2     35.1
                               sign    verify    sign/s verify/s
  160 bits ecdsa (secp160r1)   0.0086s   0.0068s    116.8    147.0
  192 bits ecdsa (nistp192)   0.0094s   0.0068s    106.3    148.1
  224 bits ecdsa (nistp224)   0.0127s   0.0090s     79.0    110.7
  256 bits ecdsa (nistp256)   0.0163s   0.0114s     61.3     87.9
  384 bits ecdsa (nistp384)   0.0405s   0.0273s     24.7     36.6
  521 bits ecdsa (nistp521)   0.0815s   0.0535s     12.3     18.7
  163 bits ecdsa (nistk163)   0.0039s   0.0075s    257.5    133.9
  233 bits ecdsa (nistk233)   0.0056s   0.0106s    178.8     94.3
  283 bits ecdsa (nistk283)   0.0111s   0.0210s     89.8     47.6
  409 bits ecdsa (nistk409)   0.0235s   0.0440s     42.6     22.7
  571 bits ecdsa (nistk571)   0.0499s   0.0940s     20.0     10.6
  163 bits ecdsa (nistb163)   0.0041s   0.0079s    243.9    126.7
  233 bits ecdsa (nistb233)   0.0058s   0.0111s    171.5     89.8
  283 bits ecdsa (nistb283)   0.0120s   0.0227s     83.6     44.0
  409 bits ecdsa (nistb409)   0.0255s   0.0485s     39.3     20.6
  571 bits ecdsa (nistb571)   0.0558s   0.1045s     17.9      9.6
  256 bits ecdsa (brainpoolP256r1)   0.0190s   0.0154s     52.6     65.1
  256 bits ecdsa (brainpoolP256t1)   0.0190s   0.0140s     52.6     71.3
  384 bits ecdsa (brainpoolP384r1)   0.0559s   0.0438s     17.9     22.9
  384 bits ecdsa (brainpoolP384t1)   0.0557s   0.0399s     17.9     25.0
  512 bits ecdsa (brainpoolP512r1)   0.1171s   0.0905s      8.5     11.1
  512 bits ecdsa (brainpoolP512t1)   0.1170s   0.0818s      8.5     12.2
                               op      op/s
  160 bits ecdh (secp160r1)   0.0081s    123.6
  192 bits ecdh (nistp192)   0.0088s    114.2
  224 bits ecdh (nistp224)   0.0118s     84.7
  256 bits ecdh (nistp256)   0.0153s     65.2
  384 bits ecdh (nistp384)   0.0376s     26.6
  521 bits ecdh (nistp521)   0.0740s     13.5
  163 bits ecdh (nistk163)   0.0034s    292.7
  233 bits ecdh (nistk233)   0.0047s    211.2
  283 bits ecdh (nistk283)   0.0095s    105.0
  409 bits ecdh (nistk409)   0.0197s     50.8
  571 bits ecdh (nistk571)   0.0424s     23.6
  163 bits ecdh (nistb163)   0.0036s    275.7
  233 bits ecdh (nistb233)   0.0050s    199.6
  283 bits ecdh (nistb283)   0.0104s     96.5
  409 bits ecdh (nistb409)   0.0219s     45.6
  571 bits ecdh (nistb571)   0.0477s     21.0
  256 bits ecdh (brainpoolP256r1)   0.0181s     55.4
  256 bits ecdh (brainpoolP256t1)   0.0180s     55.5
  384 bits ecdh (brainpoolP384r1)   0.0531s     18.8
  384 bits ecdh (brainpoolP384t1)   0.0529s     18.9
  512 bits ecdh (brainpoolP512r1)   0.1114s      9.0
  512 bits ecdh (brainpoolP512t1)   0.1111s      9.0
  253 bits ecdh (X25519)   0.0018s    542.8
  448 bits ecdh (X448)   0.0091s    110.2
                               sign    verify    sign/s verify/s
  253 bits EdDSA (Ed25519)   0.0007s   0.0021s   1414.6    484.1
  456 bits EdDSA (Ed448)   0.0042s   0.0101s    237.2     98.7
 
 
 Then with -current and Harold's patch:
 
 version: 3.0.9
 NetBSD 10.99.4
 options: bn(64,64)
 gcc version 10.4.0 (NetBSD nb1 20220722) 
 CPUINFO: N/A
 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
 md5               5465.83k    18267.30k    48228.61k    82651.43k   103986.69k   105968.01k
 sha1              5218.63k    17039.10k    42995.84k    69575.19k    84851.16k    86198.35k
 rmd160            4353.84k    12775.61k    28323.21k    41072.27k    47178.84k    47687.78k
 sha256            2761.33k     7548.98k    15113.70k    20264.96k    22401.45k    22648.15k
 sha512            2123.77k     8537.17k    16601.64k    25948.02k    31042.51k    31483.41k
 hmac(md5)         3111.39k    11099.24k    33759.34k    69823.87k   100979.33k   104400.37k
 des-ede3          8164.56k     8977.73k     9180.11k     9253.08k     9272.47k     9275.19k
 aes-128-cbc      21386.94k    29187.08k    32217.39k    33099.69k    33358.59k    33377.64k
 aes-192-cbc      19566.42k    25997.76k    28536.83k    29216.09k    29325.18k    29338.79k
 aes-256-cbc      18122.49k    23488.36k    25458.56k    25999.73k    26162.69k    26176.30k
 camellia-128-cbc    18841.65k    23782.89k    25432.96k    25911.28k    26056.55k    26067.43k
 camellia-192-cbc    15786.77k    19808.98k    21059.10k    21391.73k    21492.43k    21495.15k
 camellia-256-cbc    16083.21k    19811.87k    21059.61k    21388.33k    21495.15k    21500.60k
 ghash            19137.00k    43202.32k    63001.94k    71003.00k    73717.11k    73907.63k
 rand              1138.31k     4072.40k    10236.32k    17126.84k    20989.60k    21244.77k
                   sign    verify    sign/s verify/s
 rsa  512 bits 0.002285s 0.000193s    437.6   5193.0
 rsa 1024 bits 0.013883s 0.000668s     72.0   1497.2
 rsa 2048 bits 0.091636s 0.002458s     10.9    406.9
 rsa 3072 bits 0.306061s 0.005970s      3.3    167.5
 rsa 4096 bits 0.625625s 0.009125s      1.6    109.6
 rsa 7680 bits 4.293333s 0.035857s      0.2     27.9
                   sign    verify    sign/s verify/s
 dsa  512 bits 0.003367s 0.002525s    297.0    396.1
 dsa 1024 bits 0.009597s 0.008667s    104.2    115.4
 dsa 2048 bits 0.032387s 0.031149s     30.9     32.1
                               sign    verify    sign/s verify/s
  160 bits ecdsa (secp160r1)   0.0085s   0.0068s    118.0    146.9
  192 bits ecdsa (nistp192)   0.0079s   0.0059s    127.2    168.2
  224 bits ecdsa (nistp224)   0.0111s   0.0081s     90.0    123.7
  256 bits ecdsa (nistp256)   0.0145s   0.0105s     69.0     95.4
  384 bits ecdsa (nistp384)   0.0381s   0.0259s     26.2     38.6
  521 bits ecdsa (nistp521)   0.0825s   0.0540s     12.1     18.5
  163 bits ecdsa (nistk163)   0.0037s   0.0069s    271.5    144.8
  233 bits ecdsa (nistk233)   0.0053s   0.0098s    188.0    102.1
  283 bits ecdsa (nistk283)   0.0108s   0.0202s     92.2     49.6
  409 bits ecdsa (nistk409)   0.0233s   0.0427s     43.0     23.4
  571 bits ecdsa (nistk571)   0.0497s   0.0918s     20.1     10.9
  163 bits ecdsa (nistb163)   0.0039s   0.0073s    258.2    136.5
  233 bits ecdsa (nistb233)   0.0056s   0.0103s    179.6     96.7
  283 bits ecdsa (nistb283)   0.0117s   0.0219s     85.2     45.7
  409 bits ecdsa (nistb409)   0.0252s   0.0468s     39.7     21.4
  571 bits ecdsa (nistb571)   0.0558s   0.1031s     17.9      9.7
  256 bits ecdsa (brainpoolP256r1)   0.0195s   0.0161s     51.2     62.3
  256 bits ecdsa (brainpoolP256t1)   0.0195s   0.0147s     51.3     67.8
  384 bits ecdsa (brainpoolP384r1)   0.0584s   0.0463s     17.1     21.6
  384 bits ecdsa (brainpoolP384t1)   0.0584s   0.0421s     17.1     23.7
  512 bits ecdsa (brainpoolP512r1)   0.1253s   0.0976s      8.0     10.2
  512 bits ecdsa (brainpoolP512t1)   0.1251s   0.0873s      8.0     11.5
                               op      op/s
  160 bits ecdh (secp160r1)   0.0080s    125.7
  192 bits ecdh (nistp192)   0.0073s    137.5
  224 bits ecdh (nistp224)   0.0102s     97.9
  256 bits ecdh (nistp256)   0.0134s     74.6
  384 bits ecdh (nistp384)   0.0352s     28.4
  521 bits ecdh (nistp521)   0.0747s     13.4
  163 bits ecdh (nistk163)   0.0032s    313.6
  233 bits ecdh (nistk233)   0.0044s    225.8
  283 bits ecdh (nistk283)   0.0092s    108.5
  409 bits ecdh (nistk409)   0.0193s     51.8
  571 bits ecdh (nistk571)   0.0419s     23.9
  163 bits ecdh (nistb163)   0.0034s    295.1
  233 bits ecdh (nistb233)   0.0047s    212.3
  283 bits ecdh (nistb283)   0.0101s     99.2
  409 bits ecdh (nistb409)   0.0215s     46.5
  571 bits ecdh (nistb571)   0.0471s     21.2
  256 bits ecdh (brainpoolP256r1)   0.0185s     54.0
  256 bits ecdh (brainpoolP256t1)   0.0185s     54.0
  384 bits ecdh (brainpoolP384r1)   0.0555s     18.0
  384 bits ecdh (brainpoolP384t1)   0.0554s     18.0
  512 bits ecdh (brainpoolP512r1)   0.1189s      8.4
  512 bits ecdh (brainpoolP512t1)   0.1188s      8.4
  253 bits ecdh (X25519)   0.0018s    543.6
  448 bits ecdh (X448)   0.0144s     69.2
                               sign    verify    sign/s verify/s
  253 bits EdDSA (Ed25519)   0.0007s   0.0021s   1352.9    478.2
  456 bits EdDSA (Ed448)   0.0058s   0.0160s    173.0     62.4
                               sign    verify    sign/s verify/s
  256 bits SM2 (CurveSM2)   0.0196s   0.0141s     50.9     70.9
                        op     op/s
 2048 bits ffdh   0.0415s     24.1
 3072 bits ffdh   0.1248s      8.0
 4096 bits ffdh   0.2251s      4.4
 6144 bits ffdh   0.5761s      1.7
 8192 bits ffdh   0.9800s      1.0
 
 
 And finally with a patch (see below) to mimic the bn(64/32) setup of the
 native build and enable all available asm:
 
 version: 3.0.9
 NetBSD 10.99.4
 options: bn(64,32)
 gcc version 10.4.0 (NetBSD nb1 20220722) 
 CPUINFO: N/A
 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
 md5               6466.60k    21046.60k    53163.38k    85940.48k   104658.92k   106327.26k
 sha1              6036.39k    19043.59k    45855.55k    71385.73k    85175.03k    86350.76k
 rmd160            4935.82k    14083.44k    29924.70k    41885.01k    47312.20k    47763.99k
 sha256            3056.19k     8045.08k    15498.97k    20365.35k    22425.94k    22615.38k
 sha512            2294.32k     9227.40k    17163.99k    26277.00k    31091.50k    31505.18k
 hmac(md5)         3517.74k    12459.80k    37030.27k    72952.01k   101866.57k   104841.27k
 des-ede3          8305.65k     9022.61k     9215.74k     9252.06k     9269.75k     9269.75k
 aes-128-cbc      19712.18k    27262.94k    30257.92k    31011.55k    31298.34k    31254.79k
 aes-192-cbc      18591.26k    24593.37k    26851.93k    27469.74k    27656.85k    27651.40k
 aes-256-cbc      17277.25k    22319.10k    24105.33k    24661.33k    24788.28k    24804.61k
 camellia-128-cbc    18780.06k    23831.07k    25476.42k    25924.21k    26056.55k    26067.43k
 camellia-192-cbc    16116.59k    19882.97k    21060.72k    21396.16k    21489.71k    21495.15k
 camellia-256-cbc    16105.67k    19854.46k    21063.10k    21392.75k    21489.71k    21534.04k
 ghash            19221.80k    43068.00k    62880.74k    70955.04k    73943.72k    73896.74k
 rand              1057.79k     3761.03k    10001.59k    16659.18k    20915.63k    21271.89k
                   sign    verify    sign/s verify/s
 rsa  512 bits 0.000948s 0.000074s   1054.5  13512.3
 rsa 1024 bits 0.005077s 0.000225s    197.0   4452.8
 rsa 2048 bits 0.030367s 0.000815s     32.9   1227.6
 rsa 3072 bits 0.094900s 0.001748s     10.5    572.1
 rsa 4096 bits 0.208889s 0.003169s      4.8    315.6
 rsa 7680 bits 1.273750s 0.010629s      0.8     94.1
                   sign    verify    sign/s verify/s
 dsa  512 bits 0.001367s 0.000981s    731.5   1019.7
 dsa 1024 bits 0.003325s 0.002871s    300.7    348.3
 dsa 2048 bits 0.010547s 0.009811s     94.8    101.9
                               sign    verify    sign/s verify/s
  160 bits ecdsa (secp160r1)   0.0039s   0.0032s    259.3    308.7
  192 bits ecdsa (nistp192)   0.0058s   0.0047s    172.7    213.6
  224 bits ecdsa (nistp224)   0.0081s   0.0065s    123.3    153.1
  256 bits ecdsa (nistp256)   0.0014s   0.0056s    693.5    177.0
  384 bits ecdsa (nistp384)   0.0224s   0.0169s     44.7     59.0
  521 bits ecdsa (nistp521)   0.0808s   0.0574s     12.4     17.4
  163 bits ecdsa (nistk163)   0.0047s   0.0092s    211.6    108.3
  233 bits ecdsa (nistk233)   0.0088s   0.0173s    113.7     57.8
  283 bits ecdsa (nistk283)   0.0164s   0.0322s     60.9     31.1
  409 bits ecdsa (nistk409)   0.0368s   0.0720s     27.1     13.9
  571 bits ecdsa (nistk571)   0.0864s   0.1703s     11.6      5.9
  163 bits ecdsa (nistb163)   0.0050s   0.0099s    198.1    100.7
  233 bits ecdsa (nistb233)   0.0097s   0.0190s    103.0     52.8
  283 bits ecdsa (nistb283)   0.0182s   0.0358s     54.9     27.9
  409 bits ecdsa (nistb409)   0.0418s   0.0820s     23.9     12.2
  571 bits ecdsa (nistb571)   0.0986s   0.1935s     10.1      5.2
  256 bits ecdsa (brainpoolP256r1)   0.0089s   0.0077s    111.9    129.2
  256 bits ecdsa (brainpoolP256t1)   0.0088s   0.0071s    113.7    140.5
  384 bits ecdsa (brainpoolP384r1)   0.0223s   0.0179s     44.9     55.9
  384 bits ecdsa (brainpoolP384t1)   0.0224s   0.0168s     44.6     59.4
  512 bits ecdsa (brainpoolP512r1)   0.0457s   0.0370s     21.9     27.0
  512 bits ecdsa (brainpoolP512t1)   0.0455s   0.0333s     22.0     30.0
                               op      op/s
  160 bits ecdh (secp160r1)   0.0036s    276.1
  192 bits ecdh (nistp192)   0.0055s    182.0
  224 bits ecdh (nistp224)   0.0077s    129.7
  256 bits ecdh (nistp256)   0.0046s    215.9
  384 bits ecdh (nistp384)   0.0212s     47.3
  521 bits ecdh (nistp521)   0.0768s     13.0
  163 bits ecdh (nistk163)   0.0045s    223.1
  233 bits ecdh (nistk233)   0.0084s    119.0
  283 bits ecdh (nistk283)   0.0157s     63.5
  409 bits ecdh (nistk409)   0.0351s     28.5
  571 bits ecdh (nistk571)   0.0832s     12.0
  163 bits ecdh (nistb163)   0.0048s    206.9
  233 bits ecdh (nistb233)   0.0093s    107.2
  283 bits ecdh (nistb283)   0.0175s     57.0
  409 bits ecdh (nistb409)   0.0402s     24.8
  571 bits ecdh (nistb571)   0.0955s     10.5
  256 bits ecdh (brainpoolP256r1)   0.0084s    118.8
  256 bits ecdh (brainpoolP256t1)   0.0085s    117.8
  384 bits ecdh (brainpoolP384r1)   0.0215s     46.5
  384 bits ecdh (brainpoolP384t1)   0.0213s     46.9
  512 bits ecdh (brainpoolP512r1)   0.0432s     23.1
  512 bits ecdh (brainpoolP512t1)   0.0430s     23.3
  253 bits ecdh (X25519)   0.0018s    543.0
  448 bits ecdh (X448)   0.0145s     69.2
                               sign    verify    sign/s verify/s
  253 bits EdDSA (Ed25519)   0.0007s   0.0021s   1383.5    480.0
  456 bits EdDSA (Ed448)   0.0058s   0.0159s    173.1     62.7
                               sign    verify    sign/s verify/s
  256 bits SM2 (CurveSM2)   0.0089s   0.0070s    111.8    142.0
                        op     op/s
 2048 bits ffdh   0.0134s     74.4
 3072 bits ffdh   0.0332s     30.1
 4096 bits ffdh   0.0703s     14.2
 6144 bits ffdh   0.1684s      5.9
 8192 bits ffdh   0.3217s      3.1
 
 
 The numbers are not conclusive for the hashes, but the important crypto
 tests seem to be a clear winner for the native-alike configuration.
 
 Examples:
 
 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
 sha256            4412.71k    10114.87k    17312.83k    21063.44k    22513.03k    22638.22k	<- 10
 sha256            2761.33k     7548.98k    15113.70k    20264.96k    22401.45k    22648.15k	<- 64/64 patch
 sha256            3056.19k     8045.08k    15498.97k    20365.35k    22425.94k    22615.38k	<- native-alike 64/32
 
 (I would have expected better results for the last row)
 
 
                   sign    verify    sign/s verify/s
 rsa 2048 bits 0.085085s 0.002292s     11.8    436.2	<- 10
 rsa 2048 bits 0.091636s 0.002458s     10.9    406.9	<- 64/64 patch
 rsa 2048 bits 0.030367s 0.000815s     32.9   1227.6	<- native-alike 64/32
 
 rsa 4096 bits 0.580556s 0.008401s      1.7    119.0	<- 10
 rsa 4096 bits 0.625625s 0.009125s      1.6    109.6	<- 64/64 patch
 rsa 4096 bits 0.208889s 0.003169s      4.8    315.6	<- native-alike 64/32
 
 
 or for eliptic curve:
                               sign    verify    sign/s verify/s
  224 bits ecdsa (nistp224)   0.0127s   0.0090s     79.0    110.7	<- 10
  256 bits ecdsa (nistp256)   0.0163s   0.0114s     61.3     87.9	<- 10
 
  224 bits ecdsa (nistp224)   0.0111s   0.0081s     90.0    123.7	<- 64/64 patch
  256 bits ecdsa (nistp256)   0.0145s   0.0105s     69.0     95.4	<- 64/64 patch
 
  224 bits ecdsa (nistp224)   0.0081s   0.0065s    123.3    153.1	<- native-alike 64/32
  256 bits ecdsa (nistp256)   0.0014s   0.0056s    693.5    177.0	<- native-alike 64/32
 
 
 Martin
 
 
 
 Index: crypto/external/bsd/openssl/dist/crypto/ec/ec2_smpl.c
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/dist/crypto/ec/ec2_smpl.c,v
 retrieving revision 1.11
 diff -u -p -r1.11 ec2_smpl.c
 --- crypto/external/bsd/openssl/dist/crypto/ec/ec2_smpl.c	7 May 2023 18:40:18 -0000	1.11
 +++ crypto/external/bsd/openssl/dist/crypto/ec/ec2_smpl.c	26 Jun 2023 11:16:42 -0000
 @@ -19,6 +19,21 @@
  #include "crypto/bn.h"
  #include "ec_local.h"
  
 +#ifdef __sparc__
 +#if BN_BITS2 != 32
 +#error something wrong in the configuration
 +#endif
 +#ifdef SIXTY_FOUR_BIT_LONG
 +#error SIXTY_FOUR_BIT_LONG defined for sparc*
 +#endif
 +#ifdef SIXTY_FOUR_BIT
 +#error SIXTY_FOUR_BIT defined for sparc*
 +#endif
 +#ifndef THIRTY_TWO_BIT
 +#error THIRTY_TWO_BIT not defined for sparc*
 +#endif
 +#endif
 +
  #ifndef OPENSSL_NO_EC2M
  
  /*
 Index: crypto/external/bsd/openssl/include/crypto/bn_conf.h
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/include/crypto/bn_conf.h,v
 retrieving revision 1.2
 diff -u -p -r1.2 bn_conf.h
 --- crypto/external/bsd/openssl/include/crypto/bn_conf.h	7 May 2023 18:41:35 -0000	1.2
 +++ crypto/external/bsd/openssl/include/crypto/bn_conf.h	26 Jun 2023 11:16:43 -0000
 @@ -22,12 +22,15 @@
  /* Should we define BN_DIV2W here? */
  
  /* Only one for the following should be defined */
 -#ifdef _LP64
 +#if defined(_LP64) && !defined(__sparc64__)	/* sparc64 asm needs 32bit BN limbs */
  #define SIXTY_FOUR_BIT_LONG
  #elif _ILP64
  #define SIXTY_FOUR_BIT
  #else
  #define THIRTY_TWO_BIT
  #endif
 +#ifdef __sparc64__
 +#define BN_LLONG
 +#endif
  
  #endif
 Index: crypto/external/bsd/openssl/include/internal/bn_conf.h
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/include/internal/bn_conf.h,v
 retrieving revision 1.1
 diff -u -p -r1.1 bn_conf.h
 --- crypto/external/bsd/openssl/include/internal/bn_conf.h	8 Feb 2018 21:57:24 -0000	1.1
 +++ crypto/external/bsd/openssl/include/internal/bn_conf.h	26 Jun 2023 11:16:43 -0000
 @@ -21,12 +21,15 @@
  /* Should we define BN_DIV2W here? */
  
  /* Only one for the following should be defined */
 -#if _LP64
 +#if _LP64 && !__sparc64__	/* sparc64 asm needs 32bit BN limbs */
  # define SIXTY_FOUR_BIT_LONG
  #elif _ILP64
  # define SIXTY_FOUR_BIT
  #else
  # define THIRTY_TWO_BIT
  #endif
 +#ifdef __sparc64__
 +# define BN_LLONG
 +#endif
  
  #endif
 Index: crypto/external/bsd/openssl/include/openssl/configuration.h
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/include/openssl/configuration.h,v
 retrieving revision 1.3
 diff -u -p -r1.3 configuration.h
 --- crypto/external/bsd/openssl/include/openssl/configuration.h	11 May 2023 14:36:11 -0000	1.3
 +++ crypto/external/bsd/openssl/include/openssl/configuration.h	26 Jun 2023 11:16:43 -0000
 @@ -120,7 +120,7 @@ extern "C" {
  #  undef BN_LLONG
  /* Only one for the following should be defined */
  #  undef SIXTY_FOUR_BIT
 -#  ifdef __LP64__
 +#  if defined(__LP64__) && !defined(__sparc64__)	/* sparc64 asm needs 32bit BN limbs */
  #   define SIXTY_FOUR_BIT_LONG
  #   undef THIRTY_TWO_BIT
  #  else
 @@ -128,6 +128,9 @@ extern "C" {
  #   define THIRTY_TWO_BIT
  #  endif
  # endif
 +#ifdef __sparc64__
 +# define BN_LLONG
 +#endif
  
  # define RC4_INT unsigned int
  
 Index: crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/Makefile
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/Makefile,v
 retrieving revision 1.6
 diff -u -p -r1.6 Makefile
 --- crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/Makefile	18 Feb 2018 23:38:47 -0000	1.6
 +++ crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/Makefile	26 Jun 2023 11:16:43 -0000
 @@ -12,6 +12,7 @@ regen:
  		j=$$(basename $$i .pl).S; \
  		case $$j in \
  		sparc*_modes.pl|sha1-*) perl $$i $$j;; \
 +		sha512-*) perl $$i $$j; perl $$i $${j:S/512/256/};; \
  		*) perl $$i > $$j;; \
  		esac; \
  	done
 Index: crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/crypto.inc
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/crypto.inc,v
 retrieving revision 1.9
 diff -u -p -r1.9 crypto.inc
 --- crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/crypto.inc	25 May 2023 15:52:29 -0000	1.9
 +++ crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/crypto.inc	26 Jun 2023 11:16:43 -0000
 @@ -2,6 +2,9 @@
  CPUID_SRCS = sparcv9cap.c sparccpuid.S sparcv9-mont.S sparcv9a-mont.S
  CPUID_SRCS += sparct4-mont.S vis3-mont.S
  CPUID = yes
 -#CPPFLAGS += -DOPENSSL_BN_ASM_MONT
 +
 +CPPFLAGS += -DOPENSSL_BN_ASM_MONT
 +CPUID_SRCS+=bn_sparc.c
 +
  CPPFLAGS += -DOPENSSL_CPUID_OBJ
  .include "../../crypto.inc"
 Index: crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/ec.inc
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/ec.inc,v
 retrieving revision 1.3
 diff -u -p -r1.3 ec.inc
 --- crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/ec.inc	25 May 2023 15:52:29 -0000	1.3
 +++ crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/ec.inc	26 Jun 2023 11:16:43 -0000
 @@ -2,6 +2,7 @@
  EC_SRCS += \
  ecp_nistz256-sparcv9.S
  ECCPPFLAGS+= -DECP_NISTZ256_ASM
 -
 +ECCPPFLAGS+= -DOPENSSL_NO_EC_NISTP_64_GCC_128
  ECNI = yes
 +COPTS.bn_exp.c+=-Wno-error=stack-protector
  .include "../../ec.inc"
 Index: crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha.inc
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha.inc,v
 retrieving revision 1.1
 diff -u -p -r1.1 sha.inc
 --- crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha.inc	2 Mar 2014 08:58:02 -0000	1.1
 +++ crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha.inc	26 Jun 2023 11:16:43 -0000
 @@ -1,4 +1,4 @@
  .PATH.S: ${.PARSEDIR}
 -SHA_SRCS = sha1-sparcv9.S
 -SHACPPFLAGS = -DSHA1_ASM
 +SHA_SRCS = sha1-sparcv9.S sha256-sparcv9.S sha512-sparcv9.S 
 +SHACPPFLAGS = -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
  .include "../../sha.inc"
 Index: crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha256-sparcv9.S
 ===================================================================
 RCS file: crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha256-sparcv9.S
 diff -N crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha256-sparcv9.S
 --- /dev/null	1 Jan 1970 00:00:00 -0000
 +++ crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha256-sparcv9.S	26 Jun 2023 11:16:43 -0000
 @@ -0,0 +1,1948 @@
 +#ifndef __ASSEMBLER__
 +# define __ASSEMBLER__ 1
 +#endif
 +#include "crypto/sparc_arch.h"
 +
 +#ifdef __arch64__
 +.register	%g2,#scratch
 +.register	%g3,#scratch
 +#endif
 +
 +.section	".text",#alloc,#execinstr
 +
 +.align	64
 +K256:
 +.type	K256,#object
 +	.long	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
 +	.long	0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
 +	.long	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
 +	.long	0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
 +	.long	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
 +	.long	0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
 +	.long	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
 +	.long	0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
 +	.long	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
 +	.long	0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
 +	.long	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
 +	.long	0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
 +	.long	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
 +	.long	0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
 +	.long	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
 +	.long	0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
 +.size	K256,.-K256
 +
 +#ifdef __PIC__
 +SPARC_PIC_THUNK(%g1)
 +#endif
 +
 +.globl	sha256_block_data_order
 +.align	32
 +sha256_block_data_order:
 +	SPARC_LOAD_ADDRESS_LEAF(OPENSSL_sparcv9cap_P,%g1,%g5)
 +	ld	[%g1+4],%g1		! OPENSSL_sparcv9cap_P[1]
 +
 +	andcc	%g1, CFR_SHA256, %g0
 +	be	.Lsoftware
 +	nop
 +	ld	[%o0 + 0x00], %f0
 +	ld	[%o0 + 0x04], %f1
 +	ld	[%o0 + 0x08], %f2
 +	ld	[%o0 + 0x0c], %f3
 +	ld	[%o0 + 0x10], %f4
 +	ld	[%o0 + 0x14], %f5
 +	andcc	%o1, 0x7, %g0
 +	ld	[%o0 + 0x18], %f6
 +	bne,pn	%icc, .Lhwunaligned
 +	 ld	[%o0 + 0x1c], %f7
 +
 +.Lhwloop:
 +	ldd	[%o1 + 0x00], %f8
 +	ldd	[%o1 + 0x08], %f10
 +	ldd	[%o1 + 0x10], %f12
 +	ldd	[%o1 + 0x18], %f14
 +	ldd	[%o1 + 0x20], %f16
 +	ldd	[%o1 + 0x28], %f18
 +	ldd	[%o1 + 0x30], %f20
 +	subcc	%o2, 1, %o2		! done yet?
 +	ldd	[%o1 + 0x38], %f22
 +	add	%o1, 0x40, %o1
 +	prefetch [%o1 + 63], 20
 +
 +	.word	0x81b02840		! SHA256
 +
 +	bne,pt	SIZE_T_CC, .Lhwloop
 +	nop
 +
 +.Lhwfinish:
 +	st	%f0, [%o0 + 0x00]	! store context
 +	st	%f1, [%o0 + 0x04]
 +	st	%f2, [%o0 + 0x08]
 +	st	%f3, [%o0 + 0x0c]
 +	st	%f4, [%o0 + 0x10]
 +	st	%f5, [%o0 + 0x14]
 +	st	%f6, [%o0 + 0x18]
 +	retl
 +	 st	%f7, [%o0 + 0x1c]
 +
 +.align	8
 +.Lhwunaligned:
 +	.word	0x93b24300 !alignaddr	%o1,%g0,%o1
 +
 +	ldd	[%o1 + 0x00], %f10
 +.Lhwunaligned_loop:
 +	ldd	[%o1 + 0x08], %f12
 +	ldd	[%o1 + 0x10], %f14
 +	ldd	[%o1 + 0x18], %f16
 +	ldd	[%o1 + 0x20], %f18
 +	ldd	[%o1 + 0x28], %f20
 +	ldd	[%o1 + 0x30], %f22
 +	ldd	[%o1 + 0x38], %f24
 +	subcc	%o2, 1, %o2		! done yet?
 +	ldd	[%o1 + 0x40], %f26
 +	add	%o1, 0x40, %o1
 +	prefetch [%o1 + 63], 20
 +
 +	.word	0x91b2890c !faligndata	%f10,%f12,%f8
 +	.word	0x95b3090e !faligndata	%f12,%f14,%f10
 +	.word	0x99b38910 !faligndata	%f14,%f16,%f12
 +	.word	0x9db40912 !faligndata	%f16,%f18,%f14
 +	.word	0xa1b48914 !faligndata	%f18,%f20,%f16
 +	.word	0xa5b50916 !faligndata	%f20,%f22,%f18
 +	.word	0xa9b58918 !faligndata	%f22,%f24,%f20
 +	.word	0xadb6091a !faligndata	%f24,%f26,%f22
 +
 +	.word	0x81b02840		! SHA256
 +
 +	bne,pt	SIZE_T_CC, .Lhwunaligned_loop
 +	.word	0x95b68f9a !for	%f26,%f26,%f10	! %f10=%f26
 +
 +	ba	.Lhwfinish
 +	nop
 +.align	16
 +.Lsoftware:
 +	save	%sp,-STACK_FRAME-0,%sp
 +	and	%i1,7,%i4
 +	sllx	%i2,6,%i2
 +	andn	%i1,7,%i1
 +	sll	%i4,3,%i4
 +	add	%i1,%i2,%i2
 +.Lpic:	call	.+8
 +	add	%o7,K256-.Lpic,%i3
 +
 +	ld	[%i0+0],%l0
 +	ld	[%i0+4],%l1
 +	ld	[%i0+8],%l2
 +	ld	[%i0+12],%l3
 +	ld	[%i0+16],%l4
 +	ld	[%i0+20],%l5
 +	ld	[%i0+24],%l6
 +	ld	[%i0+28],%l7
 +
 +.Lloop:
 +	ldx	[%i1+0],%o0
 +	ldx	[%i1+16],%o2
 +	ldx	[%i1+32],%o4
 +	ldx	[%i1+48],%g1
 +	ldx	[%i1+8],%o1
 +	ldx	[%i1+24],%o3
 +	subcc	%g0,%i4,%i5 ! should be 64-%i4, but -%i4 works too
 +	ldx	[%i1+40],%o5
 +	bz,pt	%icc,.Laligned
 +	ldx	[%i1+56],%o7
 +
 +	sllx	%o0,%i4,%o0
 +	ldx	[%i1+64],%g2
 +	srlx	%o1,%i5,%g4
 +	sllx	%o1,%i4,%o1
 +	or	%g4,%o0,%o0
 +	srlx	%o2,%i5,%g4
 +	sllx	%o2,%i4,%o2
 +	or	%g4,%o1,%o1
 +	srlx	%o3,%i5,%g4
 +	sllx	%o3,%i4,%o3
 +	or	%g4,%o2,%o2
 +	srlx	%o4,%i5,%g4
 +	sllx	%o4,%i4,%o4
 +	or	%g4,%o3,%o3
 +	srlx	%o5,%i5,%g4
 +	sllx	%o5,%i4,%o5
 +	or	%g4,%o4,%o4
 +	srlx	%g1,%i5,%g4
 +	sllx	%g1,%i4,%g1
 +	or	%g4,%o5,%o5
 +	srlx	%o7,%i5,%g4
 +	sllx	%o7,%i4,%o7
 +	or	%g4,%g1,%g1
 +	srlx	%g2,%i5,%g2
 +	or	%g2,%o7,%o7
 +.Laligned:
 +	srlx	%o0,32,%g2
 +	add	%l7,%g2,%g2
 +	srl	%l4,6,%l7	!! 0
 +	xor	%l5,%l6,%g5
 +	sll	%l4,7,%g4
 +	and	%l4,%g5,%g5
 +	srl	%l4,11,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,21,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l4,25,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,26,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%l6,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l7,%g3		! Sigma1(e)
 +
 +	srl	%l0,2,%l7
 +	add	%g5,%g2,%g2
 +	ld	[%i3+0],%g5	! K[0]
 +	sll	%l0,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l0,13,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,19,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l0,22,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,30,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%g4,%l7,%l7		! Sigma0(a)
 +
 +	or	%l0,%l1,%g3
 +	and	%l0,%l1,%g4
 +	and	%l2,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[0]
 +	add	%g4,%l7,%l7
 +
 +	add	%g2,%l3,%l3
 +	add	%g2,%l7,%l7
 +	add	%o0,%l6,%g2
 +	srl	%l3,6,%l6	!! 1
 +	xor	%l4,%l5,%g5
 +	sll	%l3,7,%g4
 +	and	%l3,%g5,%g5
 +	srl	%l3,11,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,21,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l3,25,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,26,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%l5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l6,%g3		! Sigma1(e)
 +
 +	srl	%l7,2,%l6
 +	add	%g5,%g2,%g2
 +	ld	[%i3+4],%g5	! K[1]
 +	sll	%l7,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l7,13,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,19,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l7,22,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,30,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%g4,%l6,%l6		! Sigma0(a)
 +
 +	or	%l7,%l0,%g3
 +	and	%l7,%l0,%g4
 +	and	%l1,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[1]
 +	add	%g4,%l6,%l6
 +
 +	add	%g2,%l2,%l2
 +	add	%g2,%l6,%l6
 +	srlx	%o1,32,%g2
 +	add	%l5,%g2,%g2
 +	srl	%l2,6,%l5	!! 2
 +	xor	%l3,%l4,%g5
 +	sll	%l2,7,%g4
 +	and	%l2,%g5,%g5
 +	srl	%l2,11,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,21,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l2,25,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,26,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%l4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l5,%g3		! Sigma1(e)
 +
 +	srl	%l6,2,%l5
 +	add	%g5,%g2,%g2
 +	ld	[%i3+8],%g5	! K[2]
 +	sll	%l6,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l6,13,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,19,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l6,22,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,30,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%g4,%l5,%l5		! Sigma0(a)
 +
 +	or	%l6,%l7,%g3
 +	and	%l6,%l7,%g4
 +	and	%l0,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[2]
 +	add	%g4,%l5,%l5
 +
 +	add	%g2,%l1,%l1
 +	add	%g2,%l5,%l5
 +	add	%o1,%l4,%g2
 +	srl	%l1,6,%l4	!! 3
 +	xor	%l2,%l3,%g5
 +	sll	%l1,7,%g4
 +	and	%l1,%g5,%g5
 +	srl	%l1,11,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,21,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l1,25,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,26,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%l3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l4,%g3		! Sigma1(e)
 +
 +	srl	%l5,2,%l4
 +	add	%g5,%g2,%g2
 +	ld	[%i3+12],%g5	! K[3]
 +	sll	%l5,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l5,13,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,19,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l5,22,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,30,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%g4,%l4,%l4		! Sigma0(a)
 +
 +	or	%l5,%l6,%g3
 +	and	%l5,%l6,%g4
 +	and	%l7,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[3]
 +	add	%g4,%l4,%l4
 +
 +	add	%g2,%l0,%l0
 +	add	%g2,%l4,%l4
 +	srlx	%o2,32,%g2
 +	add	%l3,%g2,%g2
 +	srl	%l0,6,%l3	!! 4
 +	xor	%l1,%l2,%g5
 +	sll	%l0,7,%g4
 +	and	%l0,%g5,%g5
 +	srl	%l0,11,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,21,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l0,25,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,26,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%l2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l3,%g3		! Sigma1(e)
 +
 +	srl	%l4,2,%l3
 +	add	%g5,%g2,%g2
 +	ld	[%i3+16],%g5	! K[4]
 +	sll	%l4,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l4,13,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,19,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l4,22,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,30,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%g4,%l3,%l3		! Sigma0(a)
 +
 +	or	%l4,%l5,%g3
 +	and	%l4,%l5,%g4
 +	and	%l6,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[4]
 +	add	%g4,%l3,%l3
 +
 +	add	%g2,%l7,%l7
 +	add	%g2,%l3,%l3
 +	add	%o2,%l2,%g2
 +	srl	%l7,6,%l2	!! 5
 +	xor	%l0,%l1,%g5
 +	sll	%l7,7,%g4
 +	and	%l7,%g5,%g5
 +	srl	%l7,11,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,21,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l7,25,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,26,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%l1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l2,%g3		! Sigma1(e)
 +
 +	srl	%l3,2,%l2
 +	add	%g5,%g2,%g2
 +	ld	[%i3+20],%g5	! K[5]
 +	sll	%l3,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l3,13,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,19,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l3,22,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,30,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%g4,%l2,%l2		! Sigma0(a)
 +
 +	or	%l3,%l4,%g3
 +	and	%l3,%l4,%g4
 +	and	%l5,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[5]
 +	add	%g4,%l2,%l2
 +
 +	add	%g2,%l6,%l6
 +	add	%g2,%l2,%l2
 +	srlx	%o3,32,%g2
 +	add	%l1,%g2,%g2
 +	srl	%l6,6,%l1	!! 6
 +	xor	%l7,%l0,%g5
 +	sll	%l6,7,%g4
 +	and	%l6,%g5,%g5
 +	srl	%l6,11,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,21,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l6,25,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,26,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%l0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l1,%g3		! Sigma1(e)
 +
 +	srl	%l2,2,%l1
 +	add	%g5,%g2,%g2
 +	ld	[%i3+24],%g5	! K[6]
 +	sll	%l2,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l2,13,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,19,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l2,22,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,30,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%g4,%l1,%l1		! Sigma0(a)
 +
 +	or	%l2,%l3,%g3
 +	and	%l2,%l3,%g4
 +	and	%l4,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[6]
 +	add	%g4,%l1,%l1
 +
 +	add	%g2,%l5,%l5
 +	add	%g2,%l1,%l1
 +	add	%o3,%l0,%g2
 +	srl	%l5,6,%l0	!! 7
 +	xor	%l6,%l7,%g5
 +	sll	%l5,7,%g4
 +	and	%l5,%g5,%g5
 +	srl	%l5,11,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,21,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l5,25,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,26,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%l7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l0,%g3		! Sigma1(e)
 +
 +	srl	%l1,2,%l0
 +	add	%g5,%g2,%g2
 +	ld	[%i3+28],%g5	! K[7]
 +	sll	%l1,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l1,13,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,19,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l1,22,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,30,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%g4,%l0,%l0		! Sigma0(a)
 +
 +	or	%l1,%l2,%g3
 +	and	%l1,%l2,%g4
 +	and	%l3,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[7]
 +	add	%g4,%l0,%l0
 +
 +	add	%g2,%l4,%l4
 +	add	%g2,%l0,%l0
 +	srlx	%o4,32,%g2
 +	add	%l7,%g2,%g2
 +	srl	%l4,6,%l7	!! 8
 +	xor	%l5,%l6,%g5
 +	sll	%l4,7,%g4
 +	and	%l4,%g5,%g5
 +	srl	%l4,11,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,21,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l4,25,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,26,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%l6,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l7,%g3		! Sigma1(e)
 +
 +	srl	%l0,2,%l7
 +	add	%g5,%g2,%g2
 +	ld	[%i3+32],%g5	! K[8]
 +	sll	%l0,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l0,13,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,19,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l0,22,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,30,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%g4,%l7,%l7		! Sigma0(a)
 +
 +	or	%l0,%l1,%g3
 +	and	%l0,%l1,%g4
 +	and	%l2,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[8]
 +	add	%g4,%l7,%l7
 +
 +	add	%g2,%l3,%l3
 +	add	%g2,%l7,%l7
 +	add	%o4,%l6,%g2
 +	srl	%l3,6,%l6	!! 9
 +	xor	%l4,%l5,%g5
 +	sll	%l3,7,%g4
 +	and	%l3,%g5,%g5
 +	srl	%l3,11,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,21,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l3,25,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,26,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%l5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l6,%g3		! Sigma1(e)
 +
 +	srl	%l7,2,%l6
 +	add	%g5,%g2,%g2
 +	ld	[%i3+36],%g5	! K[9]
 +	sll	%l7,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l7,13,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,19,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l7,22,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,30,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%g4,%l6,%l6		! Sigma0(a)
 +
 +	or	%l7,%l0,%g3
 +	and	%l7,%l0,%g4
 +	and	%l1,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[9]
 +	add	%g4,%l6,%l6
 +
 +	add	%g2,%l2,%l2
 +	add	%g2,%l6,%l6
 +	srlx	%o5,32,%g2
 +	add	%l5,%g2,%g2
 +	srl	%l2,6,%l5	!! 10
 +	xor	%l3,%l4,%g5
 +	sll	%l2,7,%g4
 +	and	%l2,%g5,%g5
 +	srl	%l2,11,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,21,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l2,25,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,26,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%l4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l5,%g3		! Sigma1(e)
 +
 +	srl	%l6,2,%l5
 +	add	%g5,%g2,%g2
 +	ld	[%i3+40],%g5	! K[10]
 +	sll	%l6,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l6,13,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,19,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l6,22,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,30,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%g4,%l5,%l5		! Sigma0(a)
 +
 +	or	%l6,%l7,%g3
 +	and	%l6,%l7,%g4
 +	and	%l0,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[10]
 +	add	%g4,%l5,%l5
 +
 +	add	%g2,%l1,%l1
 +	add	%g2,%l5,%l5
 +	add	%o5,%l4,%g2
 +	srl	%l1,6,%l4	!! 11
 +	xor	%l2,%l3,%g5
 +	sll	%l1,7,%g4
 +	and	%l1,%g5,%g5
 +	srl	%l1,11,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,21,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l1,25,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,26,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%l3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l4,%g3		! Sigma1(e)
 +
 +	srl	%l5,2,%l4
 +	add	%g5,%g2,%g2
 +	ld	[%i3+44],%g5	! K[11]
 +	sll	%l5,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l5,13,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,19,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l5,22,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,30,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%g4,%l4,%l4		! Sigma0(a)
 +
 +	or	%l5,%l6,%g3
 +	and	%l5,%l6,%g4
 +	and	%l7,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[11]
 +	add	%g4,%l4,%l4
 +
 +	add	%g2,%l0,%l0
 +	add	%g2,%l4,%l4
 +	srlx	%g1,32,%g2
 +	add	%l3,%g2,%g2
 +	srl	%l0,6,%l3	!! 12
 +	xor	%l1,%l2,%g5
 +	sll	%l0,7,%g4
 +	and	%l0,%g5,%g5
 +	srl	%l0,11,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,21,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l0,25,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,26,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%l2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l3,%g3		! Sigma1(e)
 +
 +	srl	%l4,2,%l3
 +	add	%g5,%g2,%g2
 +	ld	[%i3+48],%g5	! K[12]
 +	sll	%l4,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l4,13,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,19,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l4,22,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,30,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%g4,%l3,%l3		! Sigma0(a)
 +
 +	or	%l4,%l5,%g3
 +	and	%l4,%l5,%g4
 +	and	%l6,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[12]
 +	add	%g4,%l3,%l3
 +
 +	add	%g2,%l7,%l7
 +	add	%g2,%l3,%l3
 +	add	%g1,%l2,%g2
 +	srl	%l7,6,%l2	!! 13
 +	xor	%l0,%l1,%g5
 +	sll	%l7,7,%g4
 +	and	%l7,%g5,%g5
 +	srl	%l7,11,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,21,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l7,25,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,26,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%l1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l2,%g3		! Sigma1(e)
 +
 +	srl	%l3,2,%l2
 +	add	%g5,%g2,%g2
 +	ld	[%i3+52],%g5	! K[13]
 +	sll	%l3,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l3,13,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,19,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l3,22,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,30,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%g4,%l2,%l2		! Sigma0(a)
 +
 +	or	%l3,%l4,%g3
 +	and	%l3,%l4,%g4
 +	and	%l5,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[13]
 +	add	%g4,%l2,%l2
 +
 +	add	%g2,%l6,%l6
 +	add	%g2,%l2,%l2
 +	srlx	%o7,32,%g2
 +	add	%l1,%g2,%g2
 +	srl	%l6,6,%l1	!! 14
 +	xor	%l7,%l0,%g5
 +	sll	%l6,7,%g4
 +	and	%l6,%g5,%g5
 +	srl	%l6,11,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,21,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l6,25,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,26,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%l0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l1,%g3		! Sigma1(e)
 +
 +	srl	%l2,2,%l1
 +	add	%g5,%g2,%g2
 +	ld	[%i3+56],%g5	! K[14]
 +	sll	%l2,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l2,13,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,19,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l2,22,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,30,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%g4,%l1,%l1		! Sigma0(a)
 +
 +	or	%l2,%l3,%g3
 +	and	%l2,%l3,%g4
 +	and	%l4,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[14]
 +	add	%g4,%l1,%l1
 +
 +	add	%g2,%l5,%l5
 +	add	%g2,%l1,%l1
 +	add	%o7,%l0,%g2
 +	srl	%l5,6,%l0	!! 15
 +	xor	%l6,%l7,%g5
 +	sll	%l5,7,%g4
 +	and	%l5,%g5,%g5
 +	srl	%l5,11,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,21,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l5,25,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,26,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%l7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l0,%g3		! Sigma1(e)
 +
 +	srl	%l1,2,%l0
 +	add	%g5,%g2,%g2
 +	ld	[%i3+60],%g5	! K[15]
 +	sll	%l1,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l1,13,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,19,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l1,22,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,30,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%g4,%l0,%l0		! Sigma0(a)
 +
 +	or	%l1,%l2,%g3
 +	and	%l1,%l2,%g4
 +	and	%l3,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[15]
 +	add	%g4,%l0,%l0
 +
 +	add	%g2,%l4,%l4
 +	add	%g2,%l0,%l0
 +.L16_xx:
 +	srl	%o0,3,%g2		!! Xupdate(16)
 +	sll	%o0,14,%g4
 +	srl	%o0,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%o0,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%o7,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o0,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%o4,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%o0,0,%o0
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%o0,%o0
 +	add	%l7,%g2,%g2
 +	srl	%l4,6,%l7	!! 16
 +	xor	%l5,%l6,%g5
 +	sll	%l4,7,%g4
 +	and	%l4,%g5,%g5
 +	srl	%l4,11,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,21,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l4,25,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,26,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%l6,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l7,%g3		! Sigma1(e)
 +
 +	srl	%l0,2,%l7
 +	add	%g5,%g2,%g2
 +	ld	[%i3+64],%g5	! K[16]
 +	sll	%l0,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l0,13,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,19,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l0,22,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,30,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%g4,%l7,%l7		! Sigma0(a)
 +
 +	or	%l0,%l1,%g3
 +	and	%l0,%l1,%g4
 +	and	%l2,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[16]
 +	add	%g4,%l7,%l7
 +
 +	add	%g2,%l3,%l3
 +	add	%g2,%l7,%l7
 +	srlx	%o1,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(17)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%o7,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%o7,13,%g4
 +	srl	%o7,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%o7,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o5,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%o0,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%o0,%g2,%g2			! +=X[i]
 +	xor	%g3,%o0,%o0
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%o0,%o0
 +	add	%l6,%g2,%g2
 +	srl	%l3,6,%l6	!! 17
 +	xor	%l4,%l5,%g5
 +	sll	%l3,7,%g4
 +	and	%l3,%g5,%g5
 +	srl	%l3,11,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,21,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l3,25,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,26,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%l5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l6,%g3		! Sigma1(e)
 +
 +	srl	%l7,2,%l6
 +	add	%g5,%g2,%g2
 +	ld	[%i3+68],%g5	! K[17]
 +	sll	%l7,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l7,13,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,19,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l7,22,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,30,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%g4,%l6,%l6		! Sigma0(a)
 +
 +	or	%l7,%l0,%g3
 +	and	%l7,%l0,%g4
 +	and	%l1,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[17]
 +	add	%g4,%l6,%l6
 +
 +	add	%g2,%l2,%l2
 +	add	%g2,%l6,%l6
 +	srl	%o1,3,%g2		!! Xupdate(18)
 +	sll	%o1,14,%g4
 +	srl	%o1,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%o1,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%o0,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o1,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%o5,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%o1,0,%o1
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%o1,%o1
 +	add	%l5,%g2,%g2
 +	srl	%l2,6,%l5	!! 18
 +	xor	%l3,%l4,%g5
 +	sll	%l2,7,%g4
 +	and	%l2,%g5,%g5
 +	srl	%l2,11,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,21,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l2,25,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,26,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%l4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l5,%g3		! Sigma1(e)
 +
 +	srl	%l6,2,%l5
 +	add	%g5,%g2,%g2
 +	ld	[%i3+72],%g5	! K[18]
 +	sll	%l6,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l6,13,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,19,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l6,22,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,30,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%g4,%l5,%l5		! Sigma0(a)
 +
 +	or	%l6,%l7,%g3
 +	and	%l6,%l7,%g4
 +	and	%l0,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[18]
 +	add	%g4,%l5,%l5
 +
 +	add	%g2,%l1,%l1
 +	add	%g2,%l5,%l5
 +	srlx	%o2,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(19)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%o0,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%o0,13,%g4
 +	srl	%o0,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%o0,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%g1,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%o1,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%o1,%g2,%g2			! +=X[i]
 +	xor	%g3,%o1,%o1
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%o1,%o1
 +	add	%l4,%g2,%g2
 +	srl	%l1,6,%l4	!! 19
 +	xor	%l2,%l3,%g5
 +	sll	%l1,7,%g4
 +	and	%l1,%g5,%g5
 +	srl	%l1,11,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,21,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l1,25,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,26,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%l3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l4,%g3		! Sigma1(e)
 +
 +	srl	%l5,2,%l4
 +	add	%g5,%g2,%g2
 +	ld	[%i3+76],%g5	! K[19]
 +	sll	%l5,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l5,13,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,19,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l5,22,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,30,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%g4,%l4,%l4		! Sigma0(a)
 +
 +	or	%l5,%l6,%g3
 +	and	%l5,%l6,%g4
 +	and	%l7,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[19]
 +	add	%g4,%l4,%l4
 +
 +	add	%g2,%l0,%l0
 +	add	%g2,%l4,%l4
 +	srl	%o2,3,%g2		!! Xupdate(20)
 +	sll	%o2,14,%g4
 +	srl	%o2,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%o2,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%o1,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o2,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%g1,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%o2,0,%o2
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%o2,%o2
 +	add	%l3,%g2,%g2
 +	srl	%l0,6,%l3	!! 20
 +	xor	%l1,%l2,%g5
 +	sll	%l0,7,%g4
 +	and	%l0,%g5,%g5
 +	srl	%l0,11,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,21,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l0,25,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,26,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%l2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l3,%g3		! Sigma1(e)
 +
 +	srl	%l4,2,%l3
 +	add	%g5,%g2,%g2
 +	ld	[%i3+80],%g5	! K[20]
 +	sll	%l4,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l4,13,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,19,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l4,22,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,30,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%g4,%l3,%l3		! Sigma0(a)
 +
 +	or	%l4,%l5,%g3
 +	and	%l4,%l5,%g4
 +	and	%l6,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[20]
 +	add	%g4,%l3,%l3
 +
 +	add	%g2,%l7,%l7
 +	add	%g2,%l3,%l3
 +	srlx	%o3,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(21)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%o1,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%o1,13,%g4
 +	srl	%o1,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%o1,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o7,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%o2,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%o2,%g2,%g2			! +=X[i]
 +	xor	%g3,%o2,%o2
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%o2,%o2
 +	add	%l2,%g2,%g2
 +	srl	%l7,6,%l2	!! 21
 +	xor	%l0,%l1,%g5
 +	sll	%l7,7,%g4
 +	and	%l7,%g5,%g5
 +	srl	%l7,11,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,21,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l7,25,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,26,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%l1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l2,%g3		! Sigma1(e)
 +
 +	srl	%l3,2,%l2
 +	add	%g5,%g2,%g2
 +	ld	[%i3+84],%g5	! K[21]
 +	sll	%l3,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l3,13,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,19,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l3,22,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,30,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%g4,%l2,%l2		! Sigma0(a)
 +
 +	or	%l3,%l4,%g3
 +	and	%l3,%l4,%g4
 +	and	%l5,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[21]
 +	add	%g4,%l2,%l2
 +
 +	add	%g2,%l6,%l6
 +	add	%g2,%l2,%l2
 +	srl	%o3,3,%g2		!! Xupdate(22)
 +	sll	%o3,14,%g4
 +	srl	%o3,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%o3,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%o2,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o3,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%o7,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%o3,0,%o3
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%o3,%o3
 +	add	%l1,%g2,%g2
 +	srl	%l6,6,%l1	!! 22
 +	xor	%l7,%l0,%g5
 +	sll	%l6,7,%g4
 +	and	%l6,%g5,%g5
 +	srl	%l6,11,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,21,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l6,25,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,26,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%l0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l1,%g3		! Sigma1(e)
 +
 +	srl	%l2,2,%l1
 +	add	%g5,%g2,%g2
 +	ld	[%i3+88],%g5	! K[22]
 +	sll	%l2,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l2,13,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,19,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l2,22,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,30,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%g4,%l1,%l1		! Sigma0(a)
 +
 +	or	%l2,%l3,%g3
 +	and	%l2,%l3,%g4
 +	and	%l4,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[22]
 +	add	%g4,%l1,%l1
 +
 +	add	%g2,%l5,%l5
 +	add	%g2,%l1,%l1
 +	srlx	%o4,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(23)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%o2,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%o2,13,%g4
 +	srl	%o2,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%o2,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o0,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%o3,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%o3,%g2,%g2			! +=X[i]
 +	xor	%g3,%o3,%o3
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%o3,%o3
 +	add	%l0,%g2,%g2
 +	srl	%l5,6,%l0	!! 23
 +	xor	%l6,%l7,%g5
 +	sll	%l5,7,%g4
 +	and	%l5,%g5,%g5
 +	srl	%l5,11,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,21,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l5,25,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,26,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%l7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l0,%g3		! Sigma1(e)
 +
 +	srl	%l1,2,%l0
 +	add	%g5,%g2,%g2
 +	ld	[%i3+92],%g5	! K[23]
 +	sll	%l1,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l1,13,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,19,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l1,22,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,30,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%g4,%l0,%l0		! Sigma0(a)
 +
 +	or	%l1,%l2,%g3
 +	and	%l1,%l2,%g4
 +	and	%l3,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[23]
 +	add	%g4,%l0,%l0
 +
 +	add	%g2,%l4,%l4
 +	add	%g2,%l0,%l0
 +	srl	%o4,3,%g2		!! Xupdate(24)
 +	sll	%o4,14,%g4
 +	srl	%o4,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%o4,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%o3,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o4,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%o0,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%o4,0,%o4
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%o4,%o4
 +	add	%l7,%g2,%g2
 +	srl	%l4,6,%l7	!! 24
 +	xor	%l5,%l6,%g5
 +	sll	%l4,7,%g4
 +	and	%l4,%g5,%g5
 +	srl	%l4,11,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,21,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l4,25,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l4,26,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%l6,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l7,%g3		! Sigma1(e)
 +
 +	srl	%l0,2,%l7
 +	add	%g5,%g2,%g2
 +	ld	[%i3+96],%g5	! K[24]
 +	sll	%l0,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l0,13,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,19,%g4
 +	xor	%g3,%l7,%l7
 +	srl	%l0,22,%g3
 +	xor	%g4,%l7,%l7
 +	sll	%l0,30,%g4
 +	xor	%g3,%l7,%l7
 +	xor	%g4,%l7,%l7		! Sigma0(a)
 +
 +	or	%l0,%l1,%g3
 +	and	%l0,%l1,%g4
 +	and	%l2,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[24]
 +	add	%g4,%l7,%l7
 +
 +	add	%g2,%l3,%l3
 +	add	%g2,%l7,%l7
 +	srlx	%o5,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(25)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%o3,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%o3,13,%g4
 +	srl	%o3,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%o3,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o1,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%o4,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%o4,%g2,%g2			! +=X[i]
 +	xor	%g3,%o4,%o4
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%o4,%o4
 +	add	%l6,%g2,%g2
 +	srl	%l3,6,%l6	!! 25
 +	xor	%l4,%l5,%g5
 +	sll	%l3,7,%g4
 +	and	%l3,%g5,%g5
 +	srl	%l3,11,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,21,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l3,25,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l3,26,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%l5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l6,%g3		! Sigma1(e)
 +
 +	srl	%l7,2,%l6
 +	add	%g5,%g2,%g2
 +	ld	[%i3+100],%g5	! K[25]
 +	sll	%l7,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l7,13,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,19,%g4
 +	xor	%g3,%l6,%l6
 +	srl	%l7,22,%g3
 +	xor	%g4,%l6,%l6
 +	sll	%l7,30,%g4
 +	xor	%g3,%l6,%l6
 +	xor	%g4,%l6,%l6		! Sigma0(a)
 +
 +	or	%l7,%l0,%g3
 +	and	%l7,%l0,%g4
 +	and	%l1,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[25]
 +	add	%g4,%l6,%l6
 +
 +	add	%g2,%l2,%l2
 +	add	%g2,%l6,%l6
 +	srl	%o5,3,%g2		!! Xupdate(26)
 +	sll	%o5,14,%g4
 +	srl	%o5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%o5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%o4,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o5,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%o1,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%o5,0,%o5
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%o5,%o5
 +	add	%l5,%g2,%g2
 +	srl	%l2,6,%l5	!! 26
 +	xor	%l3,%l4,%g5
 +	sll	%l2,7,%g4
 +	and	%l2,%g5,%g5
 +	srl	%l2,11,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,21,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l2,25,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l2,26,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%l4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l5,%g3		! Sigma1(e)
 +
 +	srl	%l6,2,%l5
 +	add	%g5,%g2,%g2
 +	ld	[%i3+104],%g5	! K[26]
 +	sll	%l6,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l6,13,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,19,%g4
 +	xor	%g3,%l5,%l5
 +	srl	%l6,22,%g3
 +	xor	%g4,%l5,%l5
 +	sll	%l6,30,%g4
 +	xor	%g3,%l5,%l5
 +	xor	%g4,%l5,%l5		! Sigma0(a)
 +
 +	or	%l6,%l7,%g3
 +	and	%l6,%l7,%g4
 +	and	%l0,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[26]
 +	add	%g4,%l5,%l5
 +
 +	add	%g2,%l1,%l1
 +	add	%g2,%l5,%l5
 +	srlx	%g1,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(27)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%o4,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%o4,13,%g4
 +	srl	%o4,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%o4,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o2,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%o5,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%o5,%g2,%g2			! +=X[i]
 +	xor	%g3,%o5,%o5
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%o5,%o5
 +	add	%l4,%g2,%g2
 +	srl	%l1,6,%l4	!! 27
 +	xor	%l2,%l3,%g5
 +	sll	%l1,7,%g4
 +	and	%l1,%g5,%g5
 +	srl	%l1,11,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,21,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l1,25,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l1,26,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%l3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l4,%g3		! Sigma1(e)
 +
 +	srl	%l5,2,%l4
 +	add	%g5,%g2,%g2
 +	ld	[%i3+108],%g5	! K[27]
 +	sll	%l5,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l5,13,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,19,%g4
 +	xor	%g3,%l4,%l4
 +	srl	%l5,22,%g3
 +	xor	%g4,%l4,%l4
 +	sll	%l5,30,%g4
 +	xor	%g3,%l4,%l4
 +	xor	%g4,%l4,%l4		! Sigma0(a)
 +
 +	or	%l5,%l6,%g3
 +	and	%l5,%l6,%g4
 +	and	%l7,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[27]
 +	add	%g4,%l4,%l4
 +
 +	add	%g2,%l0,%l0
 +	add	%g2,%l4,%l4
 +	srl	%g1,3,%g2		!! Xupdate(28)
 +	sll	%g1,14,%g4
 +	srl	%g1,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%g1,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%o5,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%g1,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%o2,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%g1,0,%g1
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%g1,%g1
 +	add	%l3,%g2,%g2
 +	srl	%l0,6,%l3	!! 28
 +	xor	%l1,%l2,%g5
 +	sll	%l0,7,%g4
 +	and	%l0,%g5,%g5
 +	srl	%l0,11,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,21,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l0,25,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l0,26,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%l2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l3,%g3		! Sigma1(e)
 +
 +	srl	%l4,2,%l3
 +	add	%g5,%g2,%g2
 +	ld	[%i3+112],%g5	! K[28]
 +	sll	%l4,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l4,13,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,19,%g4
 +	xor	%g3,%l3,%l3
 +	srl	%l4,22,%g3
 +	xor	%g4,%l3,%l3
 +	sll	%l4,30,%g4
 +	xor	%g3,%l3,%l3
 +	xor	%g4,%l3,%l3		! Sigma0(a)
 +
 +	or	%l4,%l5,%g3
 +	and	%l4,%l5,%g4
 +	and	%l6,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[28]
 +	add	%g4,%l3,%l3
 +
 +	add	%g2,%l7,%l7
 +	add	%g2,%l3,%l3
 +	srlx	%o7,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(29)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%o5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%o5,13,%g4
 +	srl	%o5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%o5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o3,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%g1,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%g1,%g2,%g2			! +=X[i]
 +	xor	%g3,%g1,%g1
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%g1,%g1
 +	add	%l2,%g2,%g2
 +	srl	%l7,6,%l2	!! 29
 +	xor	%l0,%l1,%g5
 +	sll	%l7,7,%g4
 +	and	%l7,%g5,%g5
 +	srl	%l7,11,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,21,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l7,25,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l7,26,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%l1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l2,%g3		! Sigma1(e)
 +
 +	srl	%l3,2,%l2
 +	add	%g5,%g2,%g2
 +	ld	[%i3+116],%g5	! K[29]
 +	sll	%l3,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l3,13,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,19,%g4
 +	xor	%g3,%l2,%l2
 +	srl	%l3,22,%g3
 +	xor	%g4,%l2,%l2
 +	sll	%l3,30,%g4
 +	xor	%g3,%l2,%l2
 +	xor	%g4,%l2,%l2		! Sigma0(a)
 +
 +	or	%l3,%l4,%g3
 +	and	%l3,%l4,%g4
 +	and	%l5,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[29]
 +	add	%g4,%l2,%l2
 +
 +	add	%g2,%l6,%l6
 +	add	%g2,%l2,%l2
 +	srl	%o7,3,%g2		!! Xupdate(30)
 +	sll	%o7,14,%g4
 +	srl	%o7,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%o7,18,%g3
 +	xor	%g4,%g2,%g2
 +	srlx	%g1,32,%i5
 +	srl	%i5,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%i5,13,%g4
 +	srl	%i5,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%i5,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o7,32,%g4		! X[i]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	add	%o3,%g2,%g2			! +=X[i+9]
 +	add	%g5,%g4,%g4
 +	srl	%o7,0,%o7
 +	add	%g4,%g2,%g2
 +
 +	sllx	%g2,32,%g3
 +	or	%g3,%o7,%o7
 +	add	%l1,%g2,%g2
 +	srl	%l6,6,%l1	!! 30
 +	xor	%l7,%l0,%g5
 +	sll	%l6,7,%g4
 +	and	%l6,%g5,%g5
 +	srl	%l6,11,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,21,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l6,25,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l6,26,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%l0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l1,%g3		! Sigma1(e)
 +
 +	srl	%l2,2,%l1
 +	add	%g5,%g2,%g2
 +	ld	[%i3+120],%g5	! K[30]
 +	sll	%l2,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l2,13,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,19,%g4
 +	xor	%g3,%l1,%l1
 +	srl	%l2,22,%g3
 +	xor	%g4,%l1,%l1
 +	sll	%l2,30,%g4
 +	xor	%g3,%l1,%l1
 +	xor	%g4,%l1,%l1		! Sigma0(a)
 +
 +	or	%l2,%l3,%g3
 +	and	%l2,%l3,%g4
 +	and	%l4,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[30]
 +	add	%g4,%l1,%l1
 +
 +	add	%g2,%l5,%l5
 +	add	%g2,%l1,%l1
 +	srlx	%o0,32,%i5
 +	srl	%i5,3,%g2		!! Xupdate(31)
 +	sll	%i5,14,%g4
 +	srl	%i5,7,%g3
 +	xor	%g4,%g2,%g2
 +	sll	%g4,11,%g4
 +	xor	%g3,%g2,%g2
 +	srl	%i5,18,%g3
 +	xor	%g4,%g2,%g2
 +	srl	%g1,10,%g5
 +	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 +	sll	%g1,13,%g4
 +	srl	%g1,17,%g3
 +	xor	%g4,%g5,%g5
 +	sll	%g4,2,%g4
 +	xor	%g3,%g5,%g5
 +	srl	%g1,19,%g3
 +	xor	%g4,%g5,%g5
 +	srlx	%o4,32,%g4	! X[i+9]
 +	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 +	srl	%o7,0,%g3
 +	add	%g5,%g4,%g4
 +	add	%o7,%g2,%g2			! +=X[i]
 +	xor	%g3,%o7,%o7
 +	add	%g4,%g2,%g2
 +
 +	srl	%g2,0,%g2
 +	or	%g2,%o7,%o7
 +	add	%l0,%g2,%g2
 +	srl	%l5,6,%l0	!! 31
 +	xor	%l6,%l7,%g5
 +	sll	%l5,7,%g4
 +	and	%l5,%g5,%g5
 +	srl	%l5,11,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,21,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l5,25,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l5,26,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%l7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%l0,%g3		! Sigma1(e)
 +
 +	srl	%l1,2,%l0
 +	add	%g5,%g2,%g2
 +	ld	[%i3+124],%g5	! K[31]
 +	sll	%l1,10,%g4
 +	add	%g3,%g2,%g2
 +	srl	%l1,13,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,19,%g4
 +	xor	%g3,%l0,%l0
 +	srl	%l1,22,%g3
 +	xor	%g4,%l0,%l0
 +	sll	%l1,30,%g4
 +	xor	%g3,%l0,%l0
 +	xor	%g4,%l0,%l0		! Sigma0(a)
 +
 +	or	%l1,%l2,%g3
 +	and	%l1,%l2,%g4
 +	and	%l3,%g3,%g3
 +	or	%g3,%g4,%g4	! Maj(a,b,c)
 +	add	%g5,%g2,%g2		! +=K[31]
 +	add	%g4,%l0,%l0
 +
 +	add	%g2,%l4,%l4
 +	add	%g2,%l0,%l0
 +	and	%g5,0xfff,%g5
 +	cmp	%g5,2290
 +	bne	.L16_xx
 +	add	%i3,64,%i3	! Ktbl+=16
 +
 +	ld	[%i0+0],%o0
 +	ld	[%i0+4],%o1
 +	ld	[%i0+8],%o2
 +	ld	[%i0+12],%o3
 +	ld	[%i0+16],%o4
 +	ld	[%i0+20],%o5
 +	ld	[%i0+24],%g1
 +	ld	[%i0+28],%o7
 +
 +	add	%l0,%o0,%l0
 +	st	%l0,[%i0+0]
 +	add	%l1,%o1,%l1
 +	st	%l1,[%i0+4]
 +	add	%l2,%o2,%l2
 +	st	%l2,[%i0+8]
 +	add	%l3,%o3,%l3
 +	st	%l3,[%i0+12]
 +	add	%l4,%o4,%l4
 +	st	%l4,[%i0+16]
 +	add	%l5,%o5,%l5
 +	st	%l5,[%i0+20]
 +	add	%l6,%g1,%l6
 +	st	%l6,[%i0+24]
 +	add	%l7,%o7,%l7
 +	st	%l7,[%i0+28]
 +	add	%i1,64,%i1		! advance inp
 +	cmp	%i1,%i2
 +	bne	SIZE_T_CC,.Lloop
 +	sub	%i3,192,%i3	! rewind Ktbl
 +
 +	ret
 +	restore
 +.type	sha256_block_data_order,#function
 +.size	sha256_block_data_order,(.-sha256_block_data_order)
 +.asciz	"SHA256 block transform for SPARCv9, CRYPTOGAMS by <appro%openssl.org@localhost>"
 +.align	4
 Index: crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha512-sparcv9.S
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha512-sparcv9.S,v
 retrieving revision 1.8
 diff -u -p -r1.8 sha512-sparcv9.S
 --- crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha512-sparcv9.S	9 May 2023 17:21:17 -0000	1.8
 +++ crypto/external/bsd/openssl/lib/libcrypto/arch/sparc64/sha512-sparcv9.S	26 Jun 2023 11:16:43 -0000
 @@ -11,1938 +11,2348 @@
  .section	".text",#alloc,#execinstr
  
  .align	64
 -K256:
 -.type	K256,#object
 -	.long	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
 -	.long	0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
 -	.long	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
 -	.long	0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
 -	.long	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
 -	.long	0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
 -	.long	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
 -	.long	0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
 -	.long	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
 -	.long	0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
 -	.long	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
 -	.long	0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
 -	.long	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
 -	.long	0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
 -	.long	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
 -	.long	0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
 -.size	K256,.-K256
 +K512:
 +.type	K512,#object
 +	.long	0x428a2f98,0xd728ae22, 0x71374491,0x23ef65cd
 +	.long	0xb5c0fbcf,0xec4d3b2f, 0xe9b5dba5,0x8189dbbc
 +	.long	0x3956c25b,0xf348b538, 0x59f111f1,0xb605d019
 +	.long	0x923f82a4,0xaf194f9b, 0xab1c5ed5,0xda6d8118
 +	.long	0xd807aa98,0xa3030242, 0x12835b01,0x45706fbe
 +	.long	0x243185be,0x4ee4b28c, 0x550c7dc3,0xd5ffb4e2
 +	.long	0x72be5d74,0xf27b896f, 0x80deb1fe,0x3b1696b1
 +	.long	0x9bdc06a7,0x25c71235, 0xc19bf174,0xcf692694
 +	.long	0xe49b69c1,0x9ef14ad2, 0xefbe4786,0x384f25e3
 +	.long	0x0fc19dc6,0x8b8cd5b5, 0x240ca1cc,0x77ac9c65
 +	.long	0x2de92c6f,0x592b0275, 0x4a7484aa,0x6ea6e483
 +	.long	0x5cb0a9dc,0xbd41fbd4, 0x76f988da,0x831153b5
 +	.long	0x983e5152,0xee66dfab, 0xa831c66d,0x2db43210
 +	.long	0xb00327c8,0x98fb213f, 0xbf597fc7,0xbeef0ee4
 +	.long	0xc6e00bf3,0x3da88fc2, 0xd5a79147,0x930aa725
 +	.long	0x06ca6351,0xe003826f, 0x14292967,0x0a0e6e70
 +	.long	0x27b70a85,0x46d22ffc, 0x2e1b2138,0x5c26c926
 +	.long	0x4d2c6dfc,0x5ac42aed, 0x53380d13,0x9d95b3df
 +	.long	0x650a7354,0x8baf63de, 0x766a0abb,0x3c77b2a8
 +	.long	0x81c2c92e,0x47edaee6, 0x92722c85,0x1482353b
 +	.long	0xa2bfe8a1,0x4cf10364, 0xa81a664b,0xbc423001
 +	.long	0xc24b8b70,0xd0f89791, 0xc76c51a3,0x0654be30
 +	.long	0xd192e819,0xd6ef5218, 0xd6990624,0x5565a910
 +	.long	0xf40e3585,0x5771202a, 0x106aa070,0x32bbd1b8
 +	.long	0x19a4c116,0xb8d2d0c8, 0x1e376c08,0x5141ab53
 +	.long	0x2748774c,0xdf8eeb99, 0x34b0bcb5,0xe19b48a8
 +	.long	0x391c0cb3,0xc5c95a63, 0x4ed8aa4a,0xe3418acb
 +	.long	0x5b9cca4f,0x7763e373, 0x682e6ff3,0xd6b2b8a3
 +	.long	0x748f82ee,0x5defb2fc, 0x78a5636f,0x43172f60
 +	.long	0x84c87814,0xa1f0ab72, 0x8cc70208,0x1a6439ec
 +	.long	0x90befffa,0x23631e28, 0xa4506ceb,0xde82bde9
 +	.long	0xbef9a3f7,0xb2c67915, 0xc67178f2,0xe372532b
 +	.long	0xca273ece,0xea26619c, 0xd186b8c7,0x21c0c207
 +	.long	0xeada7dd6,0xcde0eb1e, 0xf57d4f7f,0xee6ed178
 +	.long	0x06f067aa,0x72176fba, 0x0a637dc5,0xa2c898a6
 +	.long	0x113f9804,0xbef90dae, 0x1b710b35,0x131c471b
 +	.long	0x28db77f5,0x23047d84, 0x32caab7b,0x40c72493
 +	.long	0x3c9ebe0a,0x15c9bebc, 0x431d67c4,0x9c100d4c
 +	.long	0x4cc5d4be,0xcb3e42b6, 0x597f299c,0xfc657e2a
 +	.long	0x5fcb6fab,0x3ad6faec, 0x6c44198c,0x4a475817
 +.size	K512,.-K512
  
  #ifdef __PIC__
  SPARC_PIC_THUNK(%g1)
  #endif
  
 -.globl	sha256_block_data_order
 +.globl	sha512_block_data_order
  .align	32
 -sha256_block_data_order:
 +sha512_block_data_order:
  	SPARC_LOAD_ADDRESS_LEAF(OPENSSL_sparcv9cap_P,%g1,%g5)
  	ld	[%g1+4],%g1		! OPENSSL_sparcv9cap_P[1]
  
 -	andcc	%g1, CFR_SHA256, %g0
 +	andcc	%g1, CFR_SHA512, %g0
  	be	.Lsoftware
  	nop
 -	ld	[%o0 + 0x00], %f0
 -	ld	[%o0 + 0x04], %f1
 -	ld	[%o0 + 0x08], %f2
 -	ld	[%o0 + 0x0c], %f3
 -	ld	[%o0 + 0x10], %f4
 -	ld	[%o0 + 0x14], %f5
 +	ldd	[%o0 + 0x00], %f0	! load context
 +	ldd	[%o0 + 0x08], %f2
 +	ldd	[%o0 + 0x10], %f4
 +	ldd	[%o0 + 0x18], %f6
 +	ldd	[%o0 + 0x20], %f8
 +	ldd	[%o0 + 0x28], %f10
  	andcc	%o1, 0x7, %g0
 -	ld	[%o0 + 0x18], %f6
 +	ldd	[%o0 + 0x30], %f12
  	bne,pn	%icc, .Lhwunaligned
 -	 ld	[%o0 + 0x1c], %f7
 +	 ldd	[%o0 + 0x38], %f14
  
 -.Lhwloop:
 -	ldd	[%o1 + 0x00], %f8
 -	ldd	[%o1 + 0x08], %f10
 -	ldd	[%o1 + 0x10], %f12
 -	ldd	[%o1 + 0x18], %f14
 -	ldd	[%o1 + 0x20], %f16
 -	ldd	[%o1 + 0x28], %f18
 -	ldd	[%o1 + 0x30], %f20
 +.Lhwaligned_loop:
 +	ldd	[%o1 + 0x00], %f16
 +	ldd	[%o1 + 0x08], %f18
 +	ldd	[%o1 + 0x10], %f20
 +	ldd	[%o1 + 0x18], %f22
 +	ldd	[%o1 + 0x20], %f24
 +	ldd	[%o1 + 0x28], %f26
 +	ldd	[%o1 + 0x30], %f28
 +	ldd	[%o1 + 0x38], %f30
 +	ldd	[%o1 + 0x40], %f32
 +	ldd	[%o1 + 0x48], %f34
 +	ldd	[%o1 + 0x50], %f36
 +	ldd	[%o1 + 0x58], %f38
 +	ldd	[%o1 + 0x60], %f40
 +	ldd	[%o1 + 0x68], %f42
 +	ldd	[%o1 + 0x70], %f44
  	subcc	%o2, 1, %o2		! done yet?
 -	ldd	[%o1 + 0x38], %f22
 -	add	%o1, 0x40, %o1
 +	ldd	[%o1 + 0x78], %f46
 +	add	%o1, 0x80, %o1
  	prefetch [%o1 + 63], 20
 +	prefetch [%o1 + 64+63], 20
  
 -	.word	0x81b02840		! SHA256
 +	.word	0x81b02860		! SHA512
  
 -	bne,pt	SIZE_T_CC, .Lhwloop
 +	bne,pt	SIZE_T_CC, .Lhwaligned_loop
  	nop
  
  .Lhwfinish:
 -	st	%f0, [%o0 + 0x00]	! store context
 -	st	%f1, [%o0 + 0x04]
 -	st	%f2, [%o0 + 0x08]
 -	st	%f3, [%o0 + 0x0c]
 -	st	%f4, [%o0 + 0x10]
 -	st	%f5, [%o0 + 0x14]
 -	st	%f6, [%o0 + 0x18]
 +	std	%f0, [%o0 + 0x00]	! store context
 +	std	%f2, [%o0 + 0x08]
 +	std	%f4, [%o0 + 0x10]
 +	std	%f6, [%o0 + 0x18]
 +	std	%f8, [%o0 + 0x20]
 +	std	%f10, [%o0 + 0x28]
 +	std	%f12, [%o0 + 0x30]
  	retl
 -	 st	%f7, [%o0 + 0x1c]
 +	 std	%f14, [%o0 + 0x38]
  
 -.align	8
 +.align	16
  .Lhwunaligned:
  	.word	0x93b24300 !alignaddr	%o1,%g0,%o1
  
 -	ldd	[%o1 + 0x00], %f10
 +	ldd	[%o1 + 0x00], %f18
  .Lhwunaligned_loop:
 -	ldd	[%o1 + 0x08], %f12
 -	ldd	[%o1 + 0x10], %f14
 -	ldd	[%o1 + 0x18], %f16
 -	ldd	[%o1 + 0x20], %f18
 -	ldd	[%o1 + 0x28], %f20
 -	ldd	[%o1 + 0x30], %f22
 -	ldd	[%o1 + 0x38], %f24
 +	ldd	[%o1 + 0x08], %f20
 +	ldd	[%o1 + 0x10], %f22
 +	ldd	[%o1 + 0x18], %f24
 +	ldd	[%o1 + 0x20], %f26
 +	ldd	[%o1 + 0x28], %f28
 +	ldd	[%o1 + 0x30], %f30
 +	ldd	[%o1 + 0x38], %f32
 +	ldd	[%o1 + 0x40], %f34
 +	ldd	[%o1 + 0x48], %f36
 +	ldd	[%o1 + 0x50], %f38
 +	ldd	[%o1 + 0x58], %f40
 +	ldd	[%o1 + 0x60], %f42
 +	ldd	[%o1 + 0x68], %f44
 +	ldd	[%o1 + 0x70], %f46
 +	ldd	[%o1 + 0x78], %f48
  	subcc	%o2, 1, %o2		! done yet?
 -	ldd	[%o1 + 0x40], %f26
 -	add	%o1, 0x40, %o1
 +	ldd	[%o1 + 0x80], %f50
 +	add	%o1, 0x80, %o1
  	prefetch [%o1 + 63], 20
 +	prefetch [%o1 + 64+63], 20
  
 -	.word	0x91b2890c !faligndata	%f10,%f12,%f8
 -	.word	0x95b3090e !faligndata	%f12,%f14,%f10
 -	.word	0x99b38910 !faligndata	%f14,%f16,%f12
 -	.word	0x9db40912 !faligndata	%f16,%f18,%f14
  	.word	0xa1b48914 !faligndata	%f18,%f20,%f16
  	.word	0xa5b50916 !faligndata	%f20,%f22,%f18
  	.word	0xa9b58918 !faligndata	%f22,%f24,%f20
  	.word	0xadb6091a !faligndata	%f24,%f26,%f22
 +	.word	0xb1b6891c !faligndata	%f26,%f28,%f24
 +	.word	0xb5b7091e !faligndata	%f28,%f30,%f26
 +	.word	0xb9b78901 !faligndata	%f30,%f32,%f28
 +	.word	0xbdb04903 !faligndata	%f32,%f34,%f30
 +	.word	0x83b0c905 !faligndata	%f34,%f36,%f32
 +	.word	0x87b14907 !faligndata	%f36,%f38,%f34
 +	.word	0x8bb1c909 !faligndata	%f38,%f40,%f36
 +	.word	0x8fb2490b !faligndata	%f40,%f42,%f38
 +	.word	0x93b2c90d !faligndata	%f42,%f44,%f40
 +	.word	0x97b3490f !faligndata	%f44,%f46,%f42
 +	.word	0x9bb3c911 !faligndata	%f46,%f48,%f44
 +	.word	0x9fb44913 !faligndata	%f48,%f50,%f46
  
 -	.word	0x81b02840		! SHA256
 +	.word	0x81b02860		! SHA512
  
  	bne,pt	SIZE_T_CC, .Lhwunaligned_loop
 -	.word	0x95b68f9a !for	%f26,%f26,%f10	! %f10=%f26
 +	.word	0xa5b4cf93 !for	%f50,%f50,%f18	! %f18=%f50
  
  	ba	.Lhwfinish
  	nop
  .align	16
  .Lsoftware:
 -	save	%sp,-STACK_FRAME-0,%sp
 -	and	%i1,7,%i4
 -	sllx	%i2,6,%i2
 -	andn	%i1,7,%i1
 +	save	%sp,-STACK_FRAME-128,%sp
 +	and	%i1,3,%i4
 +	sllx	%i2,7,%i2
 +	andn	%i1,3,%i1
  	sll	%i4,3,%i4
  	add	%i1,%i2,%i2
 +	mov	32,%i5
 +	sub	%i5,%i4,%i5
  .Lpic:	call	.+8
 -	add	%o7,K256-.Lpic,%i3
 +	add	%o7,K512-.Lpic,%i3
  
 -	ld	[%i0+0],%l0
 -	ld	[%i0+4],%l1
 -	ld	[%i0+8],%l2
 -	ld	[%i0+12],%l3
 -	ld	[%i0+16],%l4
 -	ld	[%i0+20],%l5
 -	ld	[%i0+24],%l6
 -	ld	[%i0+28],%l7
 +	ldx	[%i0+0],%o0
 +	ldx	[%i0+8],%o1
 +	ldx	[%i0+16],%o2
 +	ldx	[%i0+24],%o3
 +	ldx	[%i0+32],%o4
 +	ldx	[%i0+40],%o5
 +	ldx	[%i0+48],%g1
 +	ldx	[%i0+56],%o7
  
  .Lloop:
 -	ldx	[%i1+0],%o0
 -	ldx	[%i1+16],%o2
 -	ldx	[%i1+32],%o4
 -	ldx	[%i1+48],%g1
 -	ldx	[%i1+8],%o1
 -	ldx	[%i1+24],%o3
 -	subcc	%g0,%i4,%i5 ! should be 64-%i4, but -%i4 works too
 -	ldx	[%i1+40],%o5
 -	bz,pt	%icc,.Laligned
 -	ldx	[%i1+56],%o7
 -
 -	sllx	%o0,%i4,%o0
 -	ldx	[%i1+64],%g2
 -	srlx	%o1,%i5,%g4
 -	sllx	%o1,%i4,%o1
 -	or	%g4,%o0,%o0
 -	srlx	%o2,%i5,%g4
 -	sllx	%o2,%i4,%o2
 -	or	%g4,%o1,%o1
 -	srlx	%o3,%i5,%g4
 -	sllx	%o3,%i4,%o3
 -	or	%g4,%o2,%o2
 -	srlx	%o4,%i5,%g4
 -	sllx	%o4,%i4,%o4
 -	or	%g4,%o3,%o3
 -	srlx	%o5,%i5,%g4
 -	sllx	%o5,%i4,%o5
 -	or	%g4,%o4,%o4
 -	srlx	%g1,%i5,%g4
 -	sllx	%g1,%i4,%g1
 -	or	%g4,%o5,%o5
 -	srlx	%o7,%i5,%g4
 -	sllx	%o7,%i4,%o7
 -	or	%g4,%g1,%g1
 -	srlx	%g2,%i5,%g2
 -	or	%g2,%o7,%o7
 -.Laligned:
 -	srlx	%o0,32,%g2
 -	add	%l7,%g2,%g2
 -	srl	%l4,6,%l7	!! 0
 -	xor	%l5,%l6,%g5
 -	sll	%l4,7,%g4
 -	and	%l4,%g5,%g5
 -	srl	%l4,11,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,21,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l4,25,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,26,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%l6,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l7,%g3		! Sigma1(e)
 -
 -	srl	%l0,2,%l7
 -	add	%g5,%g2,%g2
 -	ld	[%i3+0],%g5	! K[0]
 -	sll	%l0,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l0,13,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,19,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l0,22,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,30,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%g4,%l7,%l7		! Sigma0(a)
 -
 -	or	%l0,%l1,%g3
 -	and	%l0,%l1,%g4
 -	and	%l2,%g3,%g3
 +	ld	[%i1+0],%l0
 +	ld	[%i1+4],%l1
 +	ld	[%i1+8],%l2
 +	ld	[%i1+12],%l3
 +	ld	[%i1+16],%l4
 +	ld	[%i1+20],%l5
 +	ld	[%i1+24],%l6
 +	cmp	%i4,0
 +	ld	[%i1+28],%l7
 +	sllx	%l1,%i4,%g5	! Xload(0)
 +	add	%i4,32,%g3
 +	sllx	%l0,%g3,%g4
 +	ld	[%i1+32],%l0
 +	srlx	%l2,%i5,%l1
 +	or	%g4,%g5,%g5
 +	or	%l1,%g5,%g5
 +	ld	[%i1+36],%l1
 +	add	%o7,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+0]
 +	srlx	%o4,14,%o7	!! 0
 +	xor	%o5,%g1,%g5
 +	sllx	%o4,23,%g4
 +	and	%o4,%g5,%g5
 +	srlx	%o4,18,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,46,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o4,41,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,50,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o7,%g3		! Sigma1(e)
 +
 +	srlx	%o0,28,%o7
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+0],%g5	! K[0]
 +	sllx	%o0,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o0,34,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,30,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o0,39,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,36,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g4,%o7,%o7		! Sigma0(a)
 +
 +	or	%o0,%o1,%g3
 +	and	%o0,%o1,%g4
 +	and	%o2,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[0]
 -	add	%g4,%l7,%l7
 +	add	%g4,%o7,%o7
 +
 +	add	%g2,%o3,%o3
 +	add	%g2,%o7,%o7
 +	sllx	%l3,%i4,%g5	! Xload(1)
 +	add	%i4,32,%g3
 +	sllx	%l2,%g3,%g4
 +	ld	[%i1+40],%l2
 +	srlx	%l4,%i5,%l3
 +	or	%g4,%g5,%g5
 +	or	%l3,%g5,%g5
 +	ld	[%i1+44],%l3
 +	add	%g1,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+8]
 +	srlx	%o3,14,%g1	!! 1
 +	xor	%o4,%o5,%g5
 +	sllx	%o3,23,%g4
 +	and	%o3,%g5,%g5
 +	srlx	%o3,18,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,46,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o3,41,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,50,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%o5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%g1,%g3		! Sigma1(e)
  
 -	add	%g2,%l3,%l3
 -	add	%g2,%l7,%l7
 -	add	%o0,%l6,%g2
 -	srl	%l3,6,%l6	!! 1
 -	xor	%l4,%l5,%g5
 -	sll	%l3,7,%g4
 -	and	%l3,%g5,%g5
 -	srl	%l3,11,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,21,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l3,25,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,26,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%l5,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l6,%g3		! Sigma1(e)
 -
 -	srl	%l7,2,%l6
 -	add	%g5,%g2,%g2
 -	ld	[%i3+4],%g5	! K[1]
 -	sll	%l7,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l7,13,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,19,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l7,22,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,30,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%g4,%l6,%l6		! Sigma0(a)
 -
 -	or	%l7,%l0,%g3
 -	and	%l7,%l0,%g4
 -	and	%l1,%g3,%g3
 +	srlx	%o7,28,%g1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+8],%g5	! K[1]
 +	sllx	%o7,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o7,34,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,30,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o7,39,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,36,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%g4,%g1,%g1		! Sigma0(a)
 +
 +	or	%o7,%o0,%g3
 +	and	%o7,%o0,%g4
 +	and	%o1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[1]
 -	add	%g4,%l6,%l6
 +	add	%g4,%g1,%g1
 +
 +	add	%g2,%o2,%o2
 +	add	%g2,%g1,%g1
 +	sllx	%l5,%i4,%g5	! Xload(2)
 +	add	%i4,32,%g3
 +	sllx	%l4,%g3,%g4
 +	ld	[%i1+48],%l4
 +	srlx	%l6,%i5,%l5
 +	or	%g4,%g5,%g5
 +	or	%l5,%g5,%g5
 +	ld	[%i1+52],%l5
 +	add	%o5,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+16]
 +	srlx	%o2,14,%o5	!! 2
 +	xor	%o3,%o4,%g5
 +	sllx	%o2,23,%g4
 +	and	%o2,%g5,%g5
 +	srlx	%o2,18,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,46,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%o2,41,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,50,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%o4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o5,%g3		! Sigma1(e)
  
 -	add	%g2,%l2,%l2
 -	add	%g2,%l6,%l6
 -	srlx	%o1,32,%g2
 -	add	%l5,%g2,%g2
 -	srl	%l2,6,%l5	!! 2
 -	xor	%l3,%l4,%g5
 -	sll	%l2,7,%g4
 -	and	%l2,%g5,%g5
 -	srl	%l2,11,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,21,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l2,25,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,26,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%l4,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l5,%g3		! Sigma1(e)
 -
 -	srl	%l6,2,%l5
 -	add	%g5,%g2,%g2
 -	ld	[%i3+8],%g5	! K[2]
 -	sll	%l6,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l6,13,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,19,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l6,22,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,30,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%g4,%l5,%l5		! Sigma0(a)
 -
 -	or	%l6,%l7,%g3
 -	and	%l6,%l7,%g4
 -	and	%l0,%g3,%g3
 +	srlx	%g1,28,%o5
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+16],%g5	! K[2]
 +	sllx	%g1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%g1,34,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,30,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%g1,39,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,36,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%g4,%o5,%o5		! Sigma0(a)
 +
 +	or	%g1,%o7,%g3
 +	and	%g1,%o7,%g4
 +	and	%o0,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[2]
 -	add	%g4,%l5,%l5
 +	add	%g4,%o5,%o5
 +
 +	add	%g2,%o1,%o1
 +	add	%g2,%o5,%o5
 +	sllx	%l7,%i4,%g5	! Xload(3)
 +	add	%i4,32,%g3
 +	sllx	%l6,%g3,%g4
 +	ld	[%i1+56],%l6
 +	srlx	%l0,%i5,%l7
 +	or	%g4,%g5,%g5
 +	or	%l7,%g5,%g5
 +	ld	[%i1+60],%l7
 +	add	%o4,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+24]
 +	srlx	%o1,14,%o4	!! 3
 +	xor	%o2,%o3,%g5
 +	sllx	%o1,23,%g4
 +	and	%o1,%g5,%g5
 +	srlx	%o1,18,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,46,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o1,41,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,50,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%o3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o4,%g3		! Sigma1(e)
  
 -	add	%g2,%l1,%l1
 -	add	%g2,%l5,%l5
 -	add	%o1,%l4,%g2
 -	srl	%l1,6,%l4	!! 3
 -	xor	%l2,%l3,%g5
 -	sll	%l1,7,%g4
 -	and	%l1,%g5,%g5
 -	srl	%l1,11,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,21,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l1,25,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,26,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%l3,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l4,%g3		! Sigma1(e)
 -
 -	srl	%l5,2,%l4
 -	add	%g5,%g2,%g2
 -	ld	[%i3+12],%g5	! K[3]
 -	sll	%l5,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l5,13,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,19,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l5,22,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,30,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%g4,%l4,%l4		! Sigma0(a)
 -
 -	or	%l5,%l6,%g3
 -	and	%l5,%l6,%g4
 -	and	%l7,%g3,%g3
 +	srlx	%o5,28,%o4
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+24],%g5	! K[3]
 +	sllx	%o5,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o5,34,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,30,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o5,39,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,36,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%g4,%o4,%o4		! Sigma0(a)
 +
 +	or	%o5,%g1,%g3
 +	and	%o5,%g1,%g4
 +	and	%o7,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[3]
 -	add	%g4,%l4,%l4
 +	add	%g4,%o4,%o4
 +
 +	add	%g2,%o0,%o0
 +	add	%g2,%o4,%o4
 +	sllx	%l1,%i4,%g5	! Xload(4)
 +	add	%i4,32,%g3
 +	sllx	%l0,%g3,%g4
 +	ld	[%i1+64],%l0
 +	srlx	%l2,%i5,%l1
 +	or	%g4,%g5,%g5
 +	or	%l1,%g5,%g5
 +	ld	[%i1+68],%l1
 +	add	%o3,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+32]
 +	srlx	%o0,14,%o3	!! 4
 +	xor	%o1,%o2,%g5
 +	sllx	%o0,23,%g4
 +	and	%o0,%g5,%g5
 +	srlx	%o0,18,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,46,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o0,41,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,50,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%o2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o3,%g3		! Sigma1(e)
 +
 +	srlx	%o4,28,%o3
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+32],%g5	! K[4]
 +	sllx	%o4,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o4,34,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,30,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o4,39,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,36,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%g4,%o3,%o3		! Sigma0(a)
  
 -	add	%g2,%l0,%l0
 -	add	%g2,%l4,%l4
 -	srlx	%o2,32,%g2
 -	add	%l3,%g2,%g2
 -	srl	%l0,6,%l3	!! 4
 -	xor	%l1,%l2,%g5
 -	sll	%l0,7,%g4
 -	and	%l0,%g5,%g5
 -	srl	%l0,11,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,21,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l0,25,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,26,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%l2,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l3,%g3		! Sigma1(e)
 -
 -	srl	%l4,2,%l3
 -	add	%g5,%g2,%g2
 -	ld	[%i3+16],%g5	! K[4]
 -	sll	%l4,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l4,13,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,19,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l4,22,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,30,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%g4,%l3,%l3		! Sigma0(a)
 -
 -	or	%l4,%l5,%g3
 -	and	%l4,%l5,%g4
 -	and	%l6,%g3,%g3
 +	or	%o4,%o5,%g3
 +	and	%o4,%o5,%g4
 +	and	%g1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[4]
 -	add	%g4,%l3,%l3
 +	add	%g4,%o3,%o3
  
 -	add	%g2,%l7,%l7
 -	add	%g2,%l3,%l3
 -	add	%o2,%l2,%g2
 -	srl	%l7,6,%l2	!! 5
 -	xor	%l0,%l1,%g5
 -	sll	%l7,7,%g4
 -	and	%l7,%g5,%g5
 -	srl	%l7,11,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,21,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l7,25,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,26,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%l1,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l2,%g3		! Sigma1(e)
 -
 -	srl	%l3,2,%l2
 -	add	%g5,%g2,%g2
 -	ld	[%i3+20],%g5	! K[5]
 -	sll	%l3,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l3,13,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,19,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l3,22,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,30,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%g4,%l2,%l2		! Sigma0(a)
 -
 -	or	%l3,%l4,%g3
 -	and	%l3,%l4,%g4
 -	and	%l5,%g3,%g3
 +	add	%g2,%o7,%o7
 +	add	%g2,%o3,%o3
 +	sllx	%l3,%i4,%g5	! Xload(5)
 +	add	%i4,32,%g3
 +	sllx	%l2,%g3,%g4
 +	ld	[%i1+72],%l2
 +	srlx	%l4,%i5,%l3
 +	or	%g4,%g5,%g5
 +	or	%l3,%g5,%g5
 +	ld	[%i1+76],%l3
 +	add	%o2,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+40]
 +	srlx	%o7,14,%o2	!! 5
 +	xor	%o0,%o1,%g5
 +	sllx	%o7,23,%g4
 +	and	%o7,%g5,%g5
 +	srlx	%o7,18,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,46,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o7,41,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,50,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%o1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o2,%g3		! Sigma1(e)
 +
 +	srlx	%o3,28,%o2
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+40],%g5	! K[5]
 +	sllx	%o3,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o3,34,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,30,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o3,39,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,36,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%g4,%o2,%o2		! Sigma0(a)
 +
 +	or	%o3,%o4,%g3
 +	and	%o3,%o4,%g4
 +	and	%o5,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[5]
 -	add	%g4,%l2,%l2
 +	add	%g4,%o2,%o2
  
 -	add	%g2,%l6,%l6
 -	add	%g2,%l2,%l2
 -	srlx	%o3,32,%g2
 -	add	%l1,%g2,%g2
 -	srl	%l6,6,%l1	!! 6
 -	xor	%l7,%l0,%g5
 -	sll	%l6,7,%g4
 -	and	%l6,%g5,%g5
 -	srl	%l6,11,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,21,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l6,25,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,26,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%l0,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l1,%g3		! Sigma1(e)
 -
 -	srl	%l2,2,%l1
 -	add	%g5,%g2,%g2
 -	ld	[%i3+24],%g5	! K[6]
 -	sll	%l2,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l2,13,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,19,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l2,22,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,30,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%g4,%l1,%l1		! Sigma0(a)
 -
 -	or	%l2,%l3,%g3
 -	and	%l2,%l3,%g4
 -	and	%l4,%g3,%g3
 +	add	%g2,%g1,%g1
 +	add	%g2,%o2,%o2
 +	sllx	%l5,%i4,%g5	! Xload(6)
 +	add	%i4,32,%g3
 +	sllx	%l4,%g3,%g4
 +	ld	[%i1+80],%l4
 +	srlx	%l6,%i5,%l5
 +	or	%g4,%g5,%g5
 +	or	%l5,%g5,%g5
 +	ld	[%i1+84],%l5
 +	add	%o1,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+48]
 +	srlx	%g1,14,%o1	!! 6
 +	xor	%o7,%o0,%g5
 +	sllx	%g1,23,%g4
 +	and	%g1,%g5,%g5
 +	srlx	%g1,18,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,46,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%g1,41,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,50,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%o0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o1,%g3		! Sigma1(e)
 +
 +	srlx	%o2,28,%o1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+48],%g5	! K[6]
 +	sllx	%o2,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o2,34,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,30,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%o2,39,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,36,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%g4,%o1,%o1		! Sigma0(a)
 +
 +	or	%o2,%o3,%g3
 +	and	%o2,%o3,%g4
 +	and	%o4,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[6]
 -	add	%g4,%l1,%l1
 +	add	%g4,%o1,%o1
 +
 +	add	%g2,%o5,%o5
 +	add	%g2,%o1,%o1
 +	sllx	%l7,%i4,%g5	! Xload(7)
 +	add	%i4,32,%g3
 +	sllx	%l6,%g3,%g4
 +	ld	[%i1+88],%l6
 +	srlx	%l0,%i5,%l7
 +	or	%g4,%g5,%g5
 +	or	%l7,%g5,%g5
 +	ld	[%i1+92],%l7
 +	add	%o0,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+56]
 +	srlx	%o5,14,%o0	!! 7
 +	xor	%g1,%o7,%g5
 +	sllx	%o5,23,%g4
 +	and	%o5,%g5,%g5
 +	srlx	%o5,18,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,46,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o5,41,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,50,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%o7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o0,%g3		! Sigma1(e)
 +
 +	srlx	%o1,28,%o0
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+56],%g5	! K[7]
 +	sllx	%o1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o1,34,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,30,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o1,39,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,36,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%g4,%o0,%o0		! Sigma0(a)
  
 -	add	%g2,%l5,%l5
 -	add	%g2,%l1,%l1
 -	add	%o3,%l0,%g2
 -	srl	%l5,6,%l0	!! 7
 -	xor	%l6,%l7,%g5
 -	sll	%l5,7,%g4
 -	and	%l5,%g5,%g5
 -	srl	%l5,11,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,21,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l5,25,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,26,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%l7,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l0,%g3		! Sigma1(e)
 -
 -	srl	%l1,2,%l0
 -	add	%g5,%g2,%g2
 -	ld	[%i3+28],%g5	! K[7]
 -	sll	%l1,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l1,13,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,19,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l1,22,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,30,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%g4,%l0,%l0		! Sigma0(a)
 -
 -	or	%l1,%l2,%g3
 -	and	%l1,%l2,%g4
 -	and	%l3,%g3,%g3
 +	or	%o1,%o2,%g3
 +	and	%o1,%o2,%g4
 +	and	%o3,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[7]
 -	add	%g4,%l0,%l0
 +	add	%g4,%o0,%o0
 +
 +	add	%g2,%o4,%o4
 +	add	%g2,%o0,%o0
 +	sllx	%l1,%i4,%g5	! Xload(8)
 +	add	%i4,32,%g3
 +	sllx	%l0,%g3,%g4
 +	ld	[%i1+96],%l0
 +	srlx	%l2,%i5,%l1
 +	or	%g4,%g5,%g5
 +	or	%l1,%g5,%g5
 +	ld	[%i1+100],%l1
 +	add	%o7,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+64]
 +	srlx	%o4,14,%o7	!! 8
 +	xor	%o5,%g1,%g5
 +	sllx	%o4,23,%g4
 +	and	%o4,%g5,%g5
 +	srlx	%o4,18,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,46,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o4,41,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,50,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o7,%g3		! Sigma1(e)
 +
 +	srlx	%o0,28,%o7
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+64],%g5	! K[8]
 +	sllx	%o0,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o0,34,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,30,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o0,39,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,36,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g4,%o7,%o7		! Sigma0(a)
  
 -	add	%g2,%l4,%l4
 -	add	%g2,%l0,%l0
 -	srlx	%o4,32,%g2
 -	add	%l7,%g2,%g2
 -	srl	%l4,6,%l7	!! 8
 -	xor	%l5,%l6,%g5
 -	sll	%l4,7,%g4
 -	and	%l4,%g5,%g5
 -	srl	%l4,11,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,21,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l4,25,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,26,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%l6,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l7,%g3		! Sigma1(e)
 -
 -	srl	%l0,2,%l7
 -	add	%g5,%g2,%g2
 -	ld	[%i3+32],%g5	! K[8]
 -	sll	%l0,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l0,13,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,19,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l0,22,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,30,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%g4,%l7,%l7		! Sigma0(a)
 -
 -	or	%l0,%l1,%g3
 -	and	%l0,%l1,%g4
 -	and	%l2,%g3,%g3
 +	or	%o0,%o1,%g3
 +	and	%o0,%o1,%g4
 +	and	%o2,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[8]
 -	add	%g4,%l7,%l7
 +	add	%g4,%o7,%o7
 +
 +	add	%g2,%o3,%o3
 +	add	%g2,%o7,%o7
 +	sllx	%l3,%i4,%g5	! Xload(9)
 +	add	%i4,32,%g3
 +	sllx	%l2,%g3,%g4
 +	ld	[%i1+104],%l2
 +	srlx	%l4,%i5,%l3
 +	or	%g4,%g5,%g5
 +	or	%l3,%g5,%g5
 +	ld	[%i1+108],%l3
 +	add	%g1,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+72]
 +	srlx	%o3,14,%g1	!! 9
 +	xor	%o4,%o5,%g5
 +	sllx	%o3,23,%g4
 +	and	%o3,%g5,%g5
 +	srlx	%o3,18,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,46,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o3,41,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,50,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%o5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%g1,%g3		! Sigma1(e)
 +
 +	srlx	%o7,28,%g1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+72],%g5	! K[9]
 +	sllx	%o7,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o7,34,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,30,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o7,39,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,36,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%g4,%g1,%g1		! Sigma0(a)
  
 -	add	%g2,%l3,%l3
 -	add	%g2,%l7,%l7
 -	add	%o4,%l6,%g2
 -	srl	%l3,6,%l6	!! 9
 -	xor	%l4,%l5,%g5
 -	sll	%l3,7,%g4
 -	and	%l3,%g5,%g5
 -	srl	%l3,11,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,21,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l3,25,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,26,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%l5,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l6,%g3		! Sigma1(e)
 -
 -	srl	%l7,2,%l6
 -	add	%g5,%g2,%g2
 -	ld	[%i3+36],%g5	! K[9]
 -	sll	%l7,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l7,13,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,19,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l7,22,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,30,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%g4,%l6,%l6		! Sigma0(a)
 -
 -	or	%l7,%l0,%g3
 -	and	%l7,%l0,%g4
 -	and	%l1,%g3,%g3
 +	or	%o7,%o0,%g3
 +	and	%o7,%o0,%g4
 +	and	%o1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[9]
 -	add	%g4,%l6,%l6
 +	add	%g4,%g1,%g1
  
 -	add	%g2,%l2,%l2
 -	add	%g2,%l6,%l6
 -	srlx	%o5,32,%g2
 -	add	%l5,%g2,%g2
 -	srl	%l2,6,%l5	!! 10
 -	xor	%l3,%l4,%g5
 -	sll	%l2,7,%g4
 -	and	%l2,%g5,%g5
 -	srl	%l2,11,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,21,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l2,25,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,26,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%l4,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l5,%g3		! Sigma1(e)
 -
 -	srl	%l6,2,%l5
 -	add	%g5,%g2,%g2
 -	ld	[%i3+40],%g5	! K[10]
 -	sll	%l6,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l6,13,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,19,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l6,22,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,30,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%g4,%l5,%l5		! Sigma0(a)
 -
 -	or	%l6,%l7,%g3
 -	and	%l6,%l7,%g4
 -	and	%l0,%g3,%g3
 +	add	%g2,%o2,%o2
 +	add	%g2,%g1,%g1
 +	sllx	%l5,%i4,%g5	! Xload(10)
 +	add	%i4,32,%g3
 +	sllx	%l4,%g3,%g4
 +	ld	[%i1+112],%l4
 +	srlx	%l6,%i5,%l5
 +	or	%g4,%g5,%g5
 +	or	%l5,%g5,%g5
 +	ld	[%i1+116],%l5
 +	add	%o5,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+80]
 +	srlx	%o2,14,%o5	!! 10
 +	xor	%o3,%o4,%g5
 +	sllx	%o2,23,%g4
 +	and	%o2,%g5,%g5
 +	srlx	%o2,18,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,46,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%o2,41,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,50,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%o4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o5,%g3		! Sigma1(e)
 +
 +	srlx	%g1,28,%o5
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+80],%g5	! K[10]
 +	sllx	%g1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%g1,34,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,30,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%g1,39,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,36,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%g4,%o5,%o5		! Sigma0(a)
 +
 +	or	%g1,%o7,%g3
 +	and	%g1,%o7,%g4
 +	and	%o0,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[10]
 -	add	%g4,%l5,%l5
 +	add	%g4,%o5,%o5
 +
 +	add	%g2,%o1,%o1
 +	add	%g2,%o5,%o5
 +	sllx	%l7,%i4,%g5	! Xload(11)
 +	add	%i4,32,%g3
 +	sllx	%l6,%g3,%g4
 +	ld	[%i1+120],%l6
 +	srlx	%l0,%i5,%l7
 +	or	%g4,%g5,%g5
 +	or	%l7,%g5,%g5
 +	ld	[%i1+124],%l7
 +	add	%o4,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+88]
 +	srlx	%o1,14,%o4	!! 11
 +	xor	%o2,%o3,%g5
 +	sllx	%o1,23,%g4
 +	and	%o1,%g5,%g5
 +	srlx	%o1,18,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,46,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o1,41,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,50,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%o3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o4,%g3		! Sigma1(e)
  
 -	add	%g2,%l1,%l1
 -	add	%g2,%l5,%l5
 -	add	%o5,%l4,%g2
 -	srl	%l1,6,%l4	!! 11
 -	xor	%l2,%l3,%g5
 -	sll	%l1,7,%g4
 -	and	%l1,%g5,%g5
 -	srl	%l1,11,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,21,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l1,25,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,26,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%l3,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l4,%g3		! Sigma1(e)
 -
 -	srl	%l5,2,%l4
 -	add	%g5,%g2,%g2
 -	ld	[%i3+44],%g5	! K[11]
 -	sll	%l5,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l5,13,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,19,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l5,22,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,30,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%g4,%l4,%l4		! Sigma0(a)
 -
 -	or	%l5,%l6,%g3
 -	and	%l5,%l6,%g4
 -	and	%l7,%g3,%g3
 +	srlx	%o5,28,%o4
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+88],%g5	! K[11]
 +	sllx	%o5,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o5,34,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,30,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o5,39,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,36,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%g4,%o4,%o4		! Sigma0(a)
 +
 +	or	%o5,%g1,%g3
 +	and	%o5,%g1,%g4
 +	and	%o7,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[11]
 -	add	%g4,%l4,%l4
 +	add	%g4,%o4,%o4
  
 -	add	%g2,%l0,%l0
 -	add	%g2,%l4,%l4
 -	srlx	%g1,32,%g2
 -	add	%l3,%g2,%g2
 -	srl	%l0,6,%l3	!! 12
 -	xor	%l1,%l2,%g5
 -	sll	%l0,7,%g4
 -	and	%l0,%g5,%g5
 -	srl	%l0,11,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,21,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l0,25,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,26,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%l2,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l3,%g3		! Sigma1(e)
 -
 -	srl	%l4,2,%l3
 -	add	%g5,%g2,%g2
 -	ld	[%i3+48],%g5	! K[12]
 -	sll	%l4,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l4,13,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,19,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l4,22,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,30,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%g4,%l3,%l3		! Sigma0(a)
 -
 -	or	%l4,%l5,%g3
 -	and	%l4,%l5,%g4
 -	and	%l6,%g3,%g3
 +	add	%g2,%o0,%o0
 +	add	%g2,%o4,%o4
 +	sllx	%l1,%i4,%g5	! Xload(12)
 +	add	%i4,32,%g3
 +	sllx	%l0,%g3,%g4
 +	
 +	srlx	%l2,%i5,%l1
 +	or	%g4,%g5,%g5
 +	or	%l1,%g5,%g5
 +	
 +	add	%o3,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+96]
 +	bnz,a,pn	%icc,.+8
 +	ld	[%i1+128],%l0
 +	srlx	%o0,14,%o3	!! 12
 +	xor	%o1,%o2,%g5
 +	sllx	%o0,23,%g4
 +	and	%o0,%g5,%g5
 +	srlx	%o0,18,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,46,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o0,41,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,50,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%o2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o3,%g3		! Sigma1(e)
 +
 +	srlx	%o4,28,%o3
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+96],%g5	! K[12]
 +	sllx	%o4,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o4,34,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,30,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o4,39,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,36,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%g4,%o3,%o3		! Sigma0(a)
 +
 +	or	%o4,%o5,%g3
 +	and	%o4,%o5,%g4
 +	and	%g1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[12]
 -	add	%g4,%l3,%l3
 +	add	%g4,%o3,%o3
 +
 +	add	%g2,%o7,%o7
 +	add	%g2,%o3,%o3
 +	sllx	%l3,%i4,%g5	! Xload(13)
 +	add	%i4,32,%g3
 +	sllx	%l2,%g3,%g4
 +	
 +	srlx	%l4,%i5,%l3
 +	or	%g4,%g5,%g5
 +	or	%l3,%g5,%g5
 +	
 +	add	%o2,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+104]
 +	srlx	%o7,14,%o2	!! 13
 +	xor	%o0,%o1,%g5
 +	sllx	%o7,23,%g4
 +	and	%o7,%g5,%g5
 +	srlx	%o7,18,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,46,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o7,41,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,50,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%o1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o2,%g3		! Sigma1(e)
  
 -	add	%g2,%l7,%l7
 -	add	%g2,%l3,%l3
 -	add	%g1,%l2,%g2
 -	srl	%l7,6,%l2	!! 13
 -	xor	%l0,%l1,%g5
 -	sll	%l7,7,%g4
 -	and	%l7,%g5,%g5
 -	srl	%l7,11,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,21,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l7,25,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,26,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%l1,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l2,%g3		! Sigma1(e)
 -
 -	srl	%l3,2,%l2
 -	add	%g5,%g2,%g2
 -	ld	[%i3+52],%g5	! K[13]
 -	sll	%l3,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l3,13,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,19,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l3,22,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,30,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%g4,%l2,%l2		! Sigma0(a)
 -
 -	or	%l3,%l4,%g3
 -	and	%l3,%l4,%g4
 -	and	%l5,%g3,%g3
 +	srlx	%o3,28,%o2
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+104],%g5	! K[13]
 +	sllx	%o3,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o3,34,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,30,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o3,39,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,36,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%g4,%o2,%o2		! Sigma0(a)
 +
 +	or	%o3,%o4,%g3
 +	and	%o3,%o4,%g4
 +	and	%o5,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[13]
 -	add	%g4,%l2,%l2
 +	add	%g4,%o2,%o2
  
 -	add	%g2,%l6,%l6
 -	add	%g2,%l2,%l2
 -	srlx	%o7,32,%g2
 -	add	%l1,%g2,%g2
 -	srl	%l6,6,%l1	!! 14
 -	xor	%l7,%l0,%g5
 -	sll	%l6,7,%g4
 -	and	%l6,%g5,%g5
 -	srl	%l6,11,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,21,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l6,25,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,26,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%l0,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l1,%g3		! Sigma1(e)
 -
 -	srl	%l2,2,%l1
 -	add	%g5,%g2,%g2
 -	ld	[%i3+56],%g5	! K[14]
 -	sll	%l2,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l2,13,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,19,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l2,22,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,30,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%g4,%l1,%l1		! Sigma0(a)
 -
 -	or	%l2,%l3,%g3
 -	and	%l2,%l3,%g4
 -	and	%l4,%g3,%g3
 +	add	%g2,%g1,%g1
 +	add	%g2,%o2,%o2
 +	sllx	%l5,%i4,%g5	! Xload(14)
 +	add	%i4,32,%g3
 +	sllx	%l4,%g3,%g4
 +	
 +	srlx	%l6,%i5,%l5
 +	or	%g4,%g5,%g5
 +	or	%l5,%g5,%g5
 +	
 +	add	%o1,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+112]
 +	srlx	%g1,14,%o1	!! 14
 +	xor	%o7,%o0,%g5
 +	sllx	%g1,23,%g4
 +	and	%g1,%g5,%g5
 +	srlx	%g1,18,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,46,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%g1,41,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,50,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%o0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o1,%g3		! Sigma1(e)
 +
 +	srlx	%o2,28,%o1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+112],%g5	! K[14]
 +	sllx	%o2,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o2,34,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,30,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%o2,39,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,36,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%g4,%o1,%o1		! Sigma0(a)
 +
 +	or	%o2,%o3,%g3
 +	and	%o2,%o3,%g4
 +	and	%o4,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[14]
 -	add	%g4,%l1,%l1
 +	add	%g4,%o1,%o1
  
 -	add	%g2,%l5,%l5
 -	add	%g2,%l1,%l1
 -	add	%o7,%l0,%g2
 -	srl	%l5,6,%l0	!! 15
 -	xor	%l6,%l7,%g5
 -	sll	%l5,7,%g4
 -	and	%l5,%g5,%g5
 -	srl	%l5,11,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,21,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l5,25,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,26,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%l7,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l0,%g3		! Sigma1(e)
 -
 -	srl	%l1,2,%l0
 -	add	%g5,%g2,%g2
 -	ld	[%i3+60],%g5	! K[15]
 -	sll	%l1,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l1,13,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,19,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l1,22,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,30,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%g4,%l0,%l0		! Sigma0(a)
 -
 -	or	%l1,%l2,%g3
 -	and	%l1,%l2,%g4
 -	and	%l3,%g3,%g3
 +	add	%g2,%o5,%o5
 +	add	%g2,%o1,%o1
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+8],%l2
 +	sllx	%l7,%i4,%g5	! Xload(15)
 +	add	%i4,32,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+12],%l3
 +	sllx	%l6,%g3,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+72],%l4
 +	srlx	%l0,%i5,%l7
 +	or	%g4,%g5,%g5
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+76],%l5
 +	or	%l7,%g5,%g5
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+112],%l6
 +	add	%o0,%g5,%g2
 +	stx	%g5,[%sp+STACK_BIAS+STACK_FRAME+120]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+116],%l7
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+0],%l0
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+4],%l1
 +	srlx	%o5,14,%o0	!! 15
 +	xor	%g1,%o7,%g5
 +	sllx	%o5,23,%g4
 +	and	%o5,%g5,%g5
 +	srlx	%o5,18,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,46,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o5,41,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,50,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%o7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o0,%g3		! Sigma1(e)
 +
 +	srlx	%o1,28,%o0
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+120],%g5	! K[15]
 +	sllx	%o1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o1,34,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,30,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o1,39,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,36,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%g4,%o0,%o0		! Sigma0(a)
 +
 +	or	%o1,%o2,%g3
 +	and	%o1,%o2,%g4
 +	and	%o3,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[15]
 -	add	%g4,%l0,%l0
 +	add	%g4,%o0,%o0
  
 -	add	%g2,%l4,%l4
 -	add	%g2,%l0,%l0
 +	add	%g2,%o4,%o4
 +	add	%g2,%o0,%o0
  .L16_xx:
 -	srl	%o0,3,%g2		!! Xupdate(16)
 -	sll	%o0,14,%g4
 -	srl	%o0,7,%g3
 +	sllx	%l2,32,%g3		!! Xupdate(16)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+16],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+20],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%o0,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%o7,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o0,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%o4,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%o0,0,%o0
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[16+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+120],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+124],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[16+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+80],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+84],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+8],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[16+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+12],%l1
 +	add	%g5,%g2,%g2		! +=X[16]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+0]
 +	add	%o7,%g2,%g2
 +	srlx	%o4,14,%o7	!! 16
 +	xor	%o5,%g1,%g5
 +	sllx	%o4,23,%g4
 +	and	%o4,%g5,%g5
 +	srlx	%o4,18,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,46,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o4,41,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,50,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o7,%g3		! Sigma1(e)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%o0,%o0
 -	add	%l7,%g2,%g2
 -	srl	%l4,6,%l7	!! 16
 -	xor	%l5,%l6,%g5
 -	sll	%l4,7,%g4
 -	and	%l4,%g5,%g5
 -	srl	%l4,11,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,21,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l4,25,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,26,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%l6,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l7,%g3		! Sigma1(e)
 -
 -	srl	%l0,2,%l7
 -	add	%g5,%g2,%g2
 -	ld	[%i3+64],%g5	! K[16]
 -	sll	%l0,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l0,13,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,19,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l0,22,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,30,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%g4,%l7,%l7		! Sigma0(a)
 -
 -	or	%l0,%l1,%g3
 -	and	%l0,%l1,%g4
 -	and	%l2,%g3,%g3
 +	srlx	%o0,28,%o7
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+128],%g5	! K[16]
 +	sllx	%o0,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o0,34,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,30,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o0,39,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,36,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g4,%o7,%o7		! Sigma0(a)
 +
 +	or	%o0,%o1,%g3
 +	and	%o0,%o1,%g4
 +	and	%o2,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[16]
 -	add	%g4,%l7,%l7
 +	add	%g4,%o7,%o7
  
 -	add	%g2,%l3,%l3
 -	add	%g2,%l7,%l7
 -	srlx	%o1,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(17)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o3,%o3
 +	add	%g2,%o7,%o7
 +	sllx	%l2,32,%g3		!! Xupdate(17)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+24],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+28],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%o7,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%o7,13,%g4
 -	srl	%o7,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%o7,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o5,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%o0,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%o0,%g2,%g2			! +=X[i]
 -	xor	%g3,%o0,%o0
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[17+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+0],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+4],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[17+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+88],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+92],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+16],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[17+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+20],%l1
 +	add	%g5,%g2,%g2		! +=X[17]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+8]
 +	add	%g1,%g2,%g2
 +	srlx	%o3,14,%g1	!! 17
 +	xor	%o4,%o5,%g5
 +	sllx	%o3,23,%g4
 +	and	%o3,%g5,%g5
 +	srlx	%o3,18,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,46,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o3,41,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,50,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%o5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%g1,%g3		! Sigma1(e)
 +
 +	srlx	%o7,28,%g1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+136],%g5	! K[17]
 +	sllx	%o7,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o7,34,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,30,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o7,39,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,36,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%g4,%g1,%g1		! Sigma0(a)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%o0,%o0
 -	add	%l6,%g2,%g2
 -	srl	%l3,6,%l6	!! 17
 -	xor	%l4,%l5,%g5
 -	sll	%l3,7,%g4
 -	and	%l3,%g5,%g5
 -	srl	%l3,11,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,21,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l3,25,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,26,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%l5,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l6,%g3		! Sigma1(e)
 -
 -	srl	%l7,2,%l6
 -	add	%g5,%g2,%g2
 -	ld	[%i3+68],%g5	! K[17]
 -	sll	%l7,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l7,13,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,19,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l7,22,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,30,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%g4,%l6,%l6		! Sigma0(a)
 -
 -	or	%l7,%l0,%g3
 -	and	%l7,%l0,%g4
 -	and	%l1,%g3,%g3
 +	or	%o7,%o0,%g3
 +	and	%o7,%o0,%g4
 +	and	%o1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[17]
 -	add	%g4,%l6,%l6
 +	add	%g4,%g1,%g1
  
 -	add	%g2,%l2,%l2
 -	add	%g2,%l6,%l6
 -	srl	%o1,3,%g2		!! Xupdate(18)
 -	sll	%o1,14,%g4
 -	srl	%o1,7,%g3
 +	add	%g2,%o2,%o2
 +	add	%g2,%g1,%g1
 +	sllx	%l2,32,%g3		!! Xupdate(18)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+32],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+36],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%o1,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%o0,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o1,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%o5,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%o1,0,%o1
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[18+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+8],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+12],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[18+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+96],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+100],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+24],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[18+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+28],%l1
 +	add	%g5,%g2,%g2		! +=X[18]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+16]
 +	add	%o5,%g2,%g2
 +	srlx	%o2,14,%o5	!! 18
 +	xor	%o3,%o4,%g5
 +	sllx	%o2,23,%g4
 +	and	%o2,%g5,%g5
 +	srlx	%o2,18,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,46,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%o2,41,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,50,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%o4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o5,%g3		! Sigma1(e)
 +
 +	srlx	%g1,28,%o5
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+144],%g5	! K[18]
 +	sllx	%g1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%g1,34,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,30,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%g1,39,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,36,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%g4,%o5,%o5		! Sigma0(a)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%o1,%o1
 -	add	%l5,%g2,%g2
 -	srl	%l2,6,%l5	!! 18
 -	xor	%l3,%l4,%g5
 -	sll	%l2,7,%g4
 -	and	%l2,%g5,%g5
 -	srl	%l2,11,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,21,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l2,25,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,26,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%l4,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l5,%g3		! Sigma1(e)
 -
 -	srl	%l6,2,%l5
 -	add	%g5,%g2,%g2
 -	ld	[%i3+72],%g5	! K[18]
 -	sll	%l6,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l6,13,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,19,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l6,22,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,30,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%g4,%l5,%l5		! Sigma0(a)
 -
 -	or	%l6,%l7,%g3
 -	and	%l6,%l7,%g4
 -	and	%l0,%g3,%g3
 +	or	%g1,%o7,%g3
 +	and	%g1,%o7,%g4
 +	and	%o0,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[18]
 -	add	%g4,%l5,%l5
 +	add	%g4,%o5,%o5
  
 -	add	%g2,%l1,%l1
 -	add	%g2,%l5,%l5
 -	srlx	%o2,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(19)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o1,%o1
 +	add	%g2,%o5,%o5
 +	sllx	%l2,32,%g3		!! Xupdate(19)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+40],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+44],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%o0,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%o0,13,%g4
 -	srl	%o0,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%o0,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%g1,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%o1,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%o1,%g2,%g2			! +=X[i]
 -	xor	%g3,%o1,%o1
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[19+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+16],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+20],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[19+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+104],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+108],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+32],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[19+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+36],%l1
 +	add	%g5,%g2,%g2		! +=X[19]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+24]
 +	add	%o4,%g2,%g2
 +	srlx	%o1,14,%o4	!! 19
 +	xor	%o2,%o3,%g5
 +	sllx	%o1,23,%g4
 +	and	%o1,%g5,%g5
 +	srlx	%o1,18,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,46,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o1,41,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,50,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%o3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o4,%g3		! Sigma1(e)
 +
 +	srlx	%o5,28,%o4
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+152],%g5	! K[19]
 +	sllx	%o5,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o5,34,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,30,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o5,39,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,36,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%g4,%o4,%o4		! Sigma0(a)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%o1,%o1
 -	add	%l4,%g2,%g2
 -	srl	%l1,6,%l4	!! 19
 -	xor	%l2,%l3,%g5
 -	sll	%l1,7,%g4
 -	and	%l1,%g5,%g5
 -	srl	%l1,11,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,21,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l1,25,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,26,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%l3,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l4,%g3		! Sigma1(e)
 -
 -	srl	%l5,2,%l4
 -	add	%g5,%g2,%g2
 -	ld	[%i3+76],%g5	! K[19]
 -	sll	%l5,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l5,13,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,19,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l5,22,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,30,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%g4,%l4,%l4		! Sigma0(a)
 -
 -	or	%l5,%l6,%g3
 -	and	%l5,%l6,%g4
 -	and	%l7,%g3,%g3
 +	or	%o5,%g1,%g3
 +	and	%o5,%g1,%g4
 +	and	%o7,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[19]
 -	add	%g4,%l4,%l4
 +	add	%g4,%o4,%o4
  
 -	add	%g2,%l0,%l0
 -	add	%g2,%l4,%l4
 -	srl	%o2,3,%g2		!! Xupdate(20)
 -	sll	%o2,14,%g4
 -	srl	%o2,7,%g3
 +	add	%g2,%o0,%o0
 +	add	%g2,%o4,%o4
 +	sllx	%l2,32,%g3		!! Xupdate(20)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+48],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+52],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%o2,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%o1,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o2,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%g1,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%o2,0,%o2
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[20+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+24],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+28],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[20+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+112],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+116],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+40],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[20+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+44],%l1
 +	add	%g5,%g2,%g2		! +=X[20]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+32]
 +	add	%o3,%g2,%g2
 +	srlx	%o0,14,%o3	!! 20
 +	xor	%o1,%o2,%g5
 +	sllx	%o0,23,%g4
 +	and	%o0,%g5,%g5
 +	srlx	%o0,18,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,46,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o0,41,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,50,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%o2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o3,%g3		! Sigma1(e)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%o2,%o2
 -	add	%l3,%g2,%g2
 -	srl	%l0,6,%l3	!! 20
 -	xor	%l1,%l2,%g5
 -	sll	%l0,7,%g4
 -	and	%l0,%g5,%g5
 -	srl	%l0,11,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,21,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l0,25,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,26,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%l2,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l3,%g3		! Sigma1(e)
 -
 -	srl	%l4,2,%l3
 -	add	%g5,%g2,%g2
 -	ld	[%i3+80],%g5	! K[20]
 -	sll	%l4,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l4,13,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,19,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l4,22,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,30,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%g4,%l3,%l3		! Sigma0(a)
 -
 -	or	%l4,%l5,%g3
 -	and	%l4,%l5,%g4
 -	and	%l6,%g3,%g3
 +	srlx	%o4,28,%o3
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+160],%g5	! K[20]
 +	sllx	%o4,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o4,34,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,30,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o4,39,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,36,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%g4,%o3,%o3		! Sigma0(a)
 +
 +	or	%o4,%o5,%g3
 +	and	%o4,%o5,%g4
 +	and	%g1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[20]
 -	add	%g4,%l3,%l3
 +	add	%g4,%o3,%o3
  
 -	add	%g2,%l7,%l7
 -	add	%g2,%l3,%l3
 -	srlx	%o3,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(21)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o7,%o7
 +	add	%g2,%o3,%o3
 +	sllx	%l2,32,%g3		!! Xupdate(21)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+56],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+60],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%o1,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%o1,13,%g4
 -	srl	%o1,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%o1,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o7,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%o2,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%o2,%g2,%g2			! +=X[i]
 -	xor	%g3,%o2,%o2
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[21+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+32],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+36],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[21+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+120],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+124],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+48],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[21+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+52],%l1
 +	add	%g5,%g2,%g2		! +=X[21]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+40]
 +	add	%o2,%g2,%g2
 +	srlx	%o7,14,%o2	!! 21
 +	xor	%o0,%o1,%g5
 +	sllx	%o7,23,%g4
 +	and	%o7,%g5,%g5
 +	srlx	%o7,18,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,46,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o7,41,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,50,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%o1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o2,%g3		! Sigma1(e)
 +
 +	srlx	%o3,28,%o2
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+168],%g5	! K[21]
 +	sllx	%o3,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o3,34,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,30,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o3,39,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,36,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%g4,%o2,%o2		! Sigma0(a)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%o2,%o2
 -	add	%l2,%g2,%g2
 -	srl	%l7,6,%l2	!! 21
 -	xor	%l0,%l1,%g5
 -	sll	%l7,7,%g4
 -	and	%l7,%g5,%g5
 -	srl	%l7,11,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,21,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l7,25,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,26,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%l1,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l2,%g3		! Sigma1(e)
 -
 -	srl	%l3,2,%l2
 -	add	%g5,%g2,%g2
 -	ld	[%i3+84],%g5	! K[21]
 -	sll	%l3,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l3,13,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,19,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l3,22,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,30,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%g4,%l2,%l2		! Sigma0(a)
 -
 -	or	%l3,%l4,%g3
 -	and	%l3,%l4,%g4
 -	and	%l5,%g3,%g3
 +	or	%o3,%o4,%g3
 +	and	%o3,%o4,%g4
 +	and	%o5,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[21]
 -	add	%g4,%l2,%l2
 +	add	%g4,%o2,%o2
  
 -	add	%g2,%l6,%l6
 -	add	%g2,%l2,%l2
 -	srl	%o3,3,%g2		!! Xupdate(22)
 -	sll	%o3,14,%g4
 -	srl	%o3,7,%g3
 +	add	%g2,%g1,%g1
 +	add	%g2,%o2,%o2
 +	sllx	%l2,32,%g3		!! Xupdate(22)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+64],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+68],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%o3,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%o2,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o3,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%o7,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%o3,0,%o3
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[22+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+40],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+44],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[22+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+0],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+4],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+56],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[22+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+60],%l1
 +	add	%g5,%g2,%g2		! +=X[22]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+48]
 +	add	%o1,%g2,%g2
 +	srlx	%g1,14,%o1	!! 22
 +	xor	%o7,%o0,%g5
 +	sllx	%g1,23,%g4
 +	and	%g1,%g5,%g5
 +	srlx	%g1,18,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,46,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%g1,41,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,50,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%o0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o1,%g3		! Sigma1(e)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%o3,%o3
 -	add	%l1,%g2,%g2
 -	srl	%l6,6,%l1	!! 22
 -	xor	%l7,%l0,%g5
 -	sll	%l6,7,%g4
 -	and	%l6,%g5,%g5
 -	srl	%l6,11,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,21,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l6,25,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,26,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%l0,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l1,%g3		! Sigma1(e)
 -
 -	srl	%l2,2,%l1
 -	add	%g5,%g2,%g2
 -	ld	[%i3+88],%g5	! K[22]
 -	sll	%l2,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l2,13,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,19,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l2,22,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,30,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%g4,%l1,%l1		! Sigma0(a)
 -
 -	or	%l2,%l3,%g3
 -	and	%l2,%l3,%g4
 -	and	%l4,%g3,%g3
 +	srlx	%o2,28,%o1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+176],%g5	! K[22]
 +	sllx	%o2,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o2,34,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,30,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%o2,39,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,36,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%g4,%o1,%o1		! Sigma0(a)
 +
 +	or	%o2,%o3,%g3
 +	and	%o2,%o3,%g4
 +	and	%o4,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[22]
 -	add	%g4,%l1,%l1
 +	add	%g4,%o1,%o1
  
 -	add	%g2,%l5,%l5
 -	add	%g2,%l1,%l1
 -	srlx	%o4,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(23)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o5,%o5
 +	add	%g2,%o1,%o1
 +	sllx	%l2,32,%g3		!! Xupdate(23)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+72],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+76],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%o2,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%o2,13,%g4
 -	srl	%o2,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%o2,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o0,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%o3,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%o3,%g2,%g2			! +=X[i]
 -	xor	%g3,%o3,%o3
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[23+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+48],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+52],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[23+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+8],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+12],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+64],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[23+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+68],%l1
 +	add	%g5,%g2,%g2		! +=X[23]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+56]
 +	add	%o0,%g2,%g2
 +	srlx	%o5,14,%o0	!! 23
 +	xor	%g1,%o7,%g5
 +	sllx	%o5,23,%g4
 +	and	%o5,%g5,%g5
 +	srlx	%o5,18,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,46,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o5,41,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,50,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%o7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o0,%g3		! Sigma1(e)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%o3,%o3
 -	add	%l0,%g2,%g2
 -	srl	%l5,6,%l0	!! 23
 -	xor	%l6,%l7,%g5
 -	sll	%l5,7,%g4
 -	and	%l5,%g5,%g5
 -	srl	%l5,11,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,21,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l5,25,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,26,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%l7,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l0,%g3		! Sigma1(e)
 -
 -	srl	%l1,2,%l0
 -	add	%g5,%g2,%g2
 -	ld	[%i3+92],%g5	! K[23]
 -	sll	%l1,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l1,13,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,19,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l1,22,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,30,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%g4,%l0,%l0		! Sigma0(a)
 -
 -	or	%l1,%l2,%g3
 -	and	%l1,%l2,%g4
 -	and	%l3,%g3,%g3
 +	srlx	%o1,28,%o0
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+184],%g5	! K[23]
 +	sllx	%o1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o1,34,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,30,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o1,39,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,36,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%g4,%o0,%o0		! Sigma0(a)
 +
 +	or	%o1,%o2,%g3
 +	and	%o1,%o2,%g4
 +	and	%o3,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[23]
 -	add	%g4,%l0,%l0
 +	add	%g4,%o0,%o0
  
 -	add	%g2,%l4,%l4
 -	add	%g2,%l0,%l0
 -	srl	%o4,3,%g2		!! Xupdate(24)
 -	sll	%o4,14,%g4
 -	srl	%o4,7,%g3
 +	add	%g2,%o4,%o4
 +	add	%g2,%o0,%o0
 +	sllx	%l2,32,%g3		!! Xupdate(24)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+80],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+84],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%o4,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%o3,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o4,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%o0,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%o4,0,%o4
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[24+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+56],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+60],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[24+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+16],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+20],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+72],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[24+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+76],%l1
 +	add	%g5,%g2,%g2		! +=X[24]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+64]
 +	add	%o7,%g2,%g2
 +	srlx	%o4,14,%o7	!! 24
 +	xor	%o5,%g1,%g5
 +	sllx	%o4,23,%g4
 +	and	%o4,%g5,%g5
 +	srlx	%o4,18,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,46,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o4,41,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o4,50,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o7,%g3		! Sigma1(e)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%o4,%o4
 -	add	%l7,%g2,%g2
 -	srl	%l4,6,%l7	!! 24
 -	xor	%l5,%l6,%g5
 -	sll	%l4,7,%g4
 -	and	%l4,%g5,%g5
 -	srl	%l4,11,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,21,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l4,25,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l4,26,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%l6,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l7,%g3		! Sigma1(e)
 -
 -	srl	%l0,2,%l7
 -	add	%g5,%g2,%g2
 -	ld	[%i3+96],%g5	! K[24]
 -	sll	%l0,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l0,13,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,19,%g4
 -	xor	%g3,%l7,%l7
 -	srl	%l0,22,%g3
 -	xor	%g4,%l7,%l7
 -	sll	%l0,30,%g4
 -	xor	%g3,%l7,%l7
 -	xor	%g4,%l7,%l7		! Sigma0(a)
 -
 -	or	%l0,%l1,%g3
 -	and	%l0,%l1,%g4
 -	and	%l2,%g3,%g3
 +	srlx	%o0,28,%o7
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+192],%g5	! K[24]
 +	sllx	%o0,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o0,34,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,30,%g4
 +	xor	%g3,%o7,%o7
 +	srlx	%o0,39,%g3
 +	xor	%g4,%o7,%o7
 +	sllx	%o0,36,%g4
 +	xor	%g3,%o7,%o7
 +	xor	%g4,%o7,%o7		! Sigma0(a)
 +
 +	or	%o0,%o1,%g3
 +	and	%o0,%o1,%g4
 +	and	%o2,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[24]
 -	add	%g4,%l7,%l7
 +	add	%g4,%o7,%o7
  
 -	add	%g2,%l3,%l3
 -	add	%g2,%l7,%l7
 -	srlx	%o5,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(25)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o3,%o3
 +	add	%g2,%o7,%o7
 +	sllx	%l2,32,%g3		!! Xupdate(25)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+88],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+92],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%o3,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%o3,13,%g4
 -	srl	%o3,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%o3,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o1,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%o4,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%o4,%g2,%g2			! +=X[i]
 -	xor	%g3,%o4,%o4
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[25+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+64],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+68],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[25+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+24],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+28],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+80],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[25+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+84],%l1
 +	add	%g5,%g2,%g2		! +=X[25]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+72]
 +	add	%g1,%g2,%g2
 +	srlx	%o3,14,%g1	!! 25
 +	xor	%o4,%o5,%g5
 +	sllx	%o3,23,%g4
 +	and	%o3,%g5,%g5
 +	srlx	%o3,18,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,46,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o3,41,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o3,50,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%o5,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%g1,%g3		! Sigma1(e)
 +
 +	srlx	%o7,28,%g1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+200],%g5	! K[25]
 +	sllx	%o7,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o7,34,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,30,%g4
 +	xor	%g3,%g1,%g1
 +	srlx	%o7,39,%g3
 +	xor	%g4,%g1,%g1
 +	sllx	%o7,36,%g4
 +	xor	%g3,%g1,%g1
 +	xor	%g4,%g1,%g1		! Sigma0(a)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%o4,%o4
 -	add	%l6,%g2,%g2
 -	srl	%l3,6,%l6	!! 25
 -	xor	%l4,%l5,%g5
 -	sll	%l3,7,%g4
 -	and	%l3,%g5,%g5
 -	srl	%l3,11,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,21,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l3,25,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l3,26,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%l5,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l6,%g3		! Sigma1(e)
 -
 -	srl	%l7,2,%l6
 -	add	%g5,%g2,%g2
 -	ld	[%i3+100],%g5	! K[25]
 -	sll	%l7,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l7,13,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,19,%g4
 -	xor	%g3,%l6,%l6
 -	srl	%l7,22,%g3
 -	xor	%g4,%l6,%l6
 -	sll	%l7,30,%g4
 -	xor	%g3,%l6,%l6
 -	xor	%g4,%l6,%l6		! Sigma0(a)
 -
 -	or	%l7,%l0,%g3
 -	and	%l7,%l0,%g4
 -	and	%l1,%g3,%g3
 +	or	%o7,%o0,%g3
 +	and	%o7,%o0,%g4
 +	and	%o1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[25]
 -	add	%g4,%l6,%l6
 +	add	%g4,%g1,%g1
  
 -	add	%g2,%l2,%l2
 -	add	%g2,%l6,%l6
 -	srl	%o5,3,%g2		!! Xupdate(26)
 -	sll	%o5,14,%g4
 -	srl	%o5,7,%g3
 +	add	%g2,%o2,%o2
 +	add	%g2,%g1,%g1
 +	sllx	%l2,32,%g3		!! Xupdate(26)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+96],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+100],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%o5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%o4,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o5,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%o1,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%o5,0,%o5
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[26+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+72],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+76],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[26+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+32],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+36],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+88],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[26+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+92],%l1
 +	add	%g5,%g2,%g2		! +=X[26]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+80]
 +	add	%o5,%g2,%g2
 +	srlx	%o2,14,%o5	!! 26
 +	xor	%o3,%o4,%g5
 +	sllx	%o2,23,%g4
 +	and	%o2,%g5,%g5
 +	srlx	%o2,18,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,46,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%o2,41,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%o2,50,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%o4,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o5,%g3		! Sigma1(e)
 +
 +	srlx	%g1,28,%o5
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+208],%g5	! K[26]
 +	sllx	%g1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%g1,34,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,30,%g4
 +	xor	%g3,%o5,%o5
 +	srlx	%g1,39,%g3
 +	xor	%g4,%o5,%o5
 +	sllx	%g1,36,%g4
 +	xor	%g3,%o5,%o5
 +	xor	%g4,%o5,%o5		! Sigma0(a)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%o5,%o5
 -	add	%l5,%g2,%g2
 -	srl	%l2,6,%l5	!! 26
 -	xor	%l3,%l4,%g5
 -	sll	%l2,7,%g4
 -	and	%l2,%g5,%g5
 -	srl	%l2,11,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,21,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l2,25,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l2,26,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%l4,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l5,%g3		! Sigma1(e)
 -
 -	srl	%l6,2,%l5
 -	add	%g5,%g2,%g2
 -	ld	[%i3+104],%g5	! K[26]
 -	sll	%l6,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l6,13,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,19,%g4
 -	xor	%g3,%l5,%l5
 -	srl	%l6,22,%g3
 -	xor	%g4,%l5,%l5
 -	sll	%l6,30,%g4
 -	xor	%g3,%l5,%l5
 -	xor	%g4,%l5,%l5		! Sigma0(a)
 -
 -	or	%l6,%l7,%g3
 -	and	%l6,%l7,%g4
 -	and	%l0,%g3,%g3
 +	or	%g1,%o7,%g3
 +	and	%g1,%o7,%g4
 +	and	%o0,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[26]
 -	add	%g4,%l5,%l5
 +	add	%g4,%o5,%o5
  
 -	add	%g2,%l1,%l1
 -	add	%g2,%l5,%l5
 -	srlx	%g1,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(27)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o1,%o1
 +	add	%g2,%o5,%o5
 +	sllx	%l2,32,%g3		!! Xupdate(27)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+104],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+108],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%o4,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%o4,13,%g4
 -	srl	%o4,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%o4,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o2,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%o5,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%o5,%g2,%g2			! +=X[i]
 -	xor	%g3,%o5,%o5
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[27+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+80],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+84],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[27+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+40],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+44],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+96],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[27+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+100],%l1
 +	add	%g5,%g2,%g2		! +=X[27]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+88]
 +	add	%o4,%g2,%g2
 +	srlx	%o1,14,%o4	!! 27
 +	xor	%o2,%o3,%g5
 +	sllx	%o1,23,%g4
 +	and	%o1,%g5,%g5
 +	srlx	%o1,18,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,46,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o1,41,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o1,50,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%o3,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o4,%g3		! Sigma1(e)
 +
 +	srlx	%o5,28,%o4
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+216],%g5	! K[27]
 +	sllx	%o5,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o5,34,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,30,%g4
 +	xor	%g3,%o4,%o4
 +	srlx	%o5,39,%g3
 +	xor	%g4,%o4,%o4
 +	sllx	%o5,36,%g4
 +	xor	%g3,%o4,%o4
 +	xor	%g4,%o4,%o4		! Sigma0(a)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%o5,%o5
 -	add	%l4,%g2,%g2
 -	srl	%l1,6,%l4	!! 27
 -	xor	%l2,%l3,%g5
 -	sll	%l1,7,%g4
 -	and	%l1,%g5,%g5
 -	srl	%l1,11,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,21,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l1,25,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l1,26,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%l3,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l4,%g3		! Sigma1(e)
 -
 -	srl	%l5,2,%l4
 -	add	%g5,%g2,%g2
 -	ld	[%i3+108],%g5	! K[27]
 -	sll	%l5,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l5,13,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,19,%g4
 -	xor	%g3,%l4,%l4
 -	srl	%l5,22,%g3
 -	xor	%g4,%l4,%l4
 -	sll	%l5,30,%g4
 -	xor	%g3,%l4,%l4
 -	xor	%g4,%l4,%l4		! Sigma0(a)
 -
 -	or	%l5,%l6,%g3
 -	and	%l5,%l6,%g4
 -	and	%l7,%g3,%g3
 +	or	%o5,%g1,%g3
 +	and	%o5,%g1,%g4
 +	and	%o7,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[27]
 -	add	%g4,%l4,%l4
 +	add	%g4,%o4,%o4
  
 -	add	%g2,%l0,%l0
 -	add	%g2,%l4,%l4
 -	srl	%g1,3,%g2		!! Xupdate(28)
 -	sll	%g1,14,%g4
 -	srl	%g1,7,%g3
 +	add	%g2,%o0,%o0
 +	add	%g2,%o4,%o4
 +	sllx	%l2,32,%g3		!! Xupdate(28)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+112],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+116],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%g1,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%o5,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%g1,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%o2,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%g1,0,%g1
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[28+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+88],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+92],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[28+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+48],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+52],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+104],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[28+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+108],%l1
 +	add	%g5,%g2,%g2		! +=X[28]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+96]
 +	add	%o3,%g2,%g2
 +	srlx	%o0,14,%o3	!! 28
 +	xor	%o1,%o2,%g5
 +	sllx	%o0,23,%g4
 +	and	%o0,%g5,%g5
 +	srlx	%o0,18,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,46,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o0,41,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o0,50,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%o2,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o3,%g3		! Sigma1(e)
 +
 +	srlx	%o4,28,%o3
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+224],%g5	! K[28]
 +	sllx	%o4,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o4,34,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,30,%g4
 +	xor	%g3,%o3,%o3
 +	srlx	%o4,39,%g3
 +	xor	%g4,%o3,%o3
 +	sllx	%o4,36,%g4
 +	xor	%g3,%o3,%o3
 +	xor	%g4,%o3,%o3		! Sigma0(a)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%g1,%g1
 -	add	%l3,%g2,%g2
 -	srl	%l0,6,%l3	!! 28
 -	xor	%l1,%l2,%g5
 -	sll	%l0,7,%g4
 -	and	%l0,%g5,%g5
 -	srl	%l0,11,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,21,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l0,25,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l0,26,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%l2,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l3,%g3		! Sigma1(e)
 -
 -	srl	%l4,2,%l3
 -	add	%g5,%g2,%g2
 -	ld	[%i3+112],%g5	! K[28]
 -	sll	%l4,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l4,13,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,19,%g4
 -	xor	%g3,%l3,%l3
 -	srl	%l4,22,%g3
 -	xor	%g4,%l3,%l3
 -	sll	%l4,30,%g4
 -	xor	%g3,%l3,%l3
 -	xor	%g4,%l3,%l3		! Sigma0(a)
 -
 -	or	%l4,%l5,%g3
 -	and	%l4,%l5,%g4
 -	and	%l6,%g3,%g3
 +	or	%o4,%o5,%g3
 +	and	%o4,%o5,%g4
 +	and	%g1,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[28]
 -	add	%g4,%l3,%l3
 +	add	%g4,%o3,%o3
  
 -	add	%g2,%l7,%l7
 -	add	%g2,%l3,%l3
 -	srlx	%o7,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(29)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o7,%o7
 +	add	%g2,%o3,%o3
 +	sllx	%l2,32,%g3		!! Xupdate(29)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+120],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+124],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%o5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%o5,13,%g4
 -	srl	%o5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%o5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o3,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%g1,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%g1,%g2,%g2			! +=X[i]
 -	xor	%g3,%g1,%g1
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[29+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+96],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+100],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[29+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+56],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+60],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+112],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[29+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+116],%l1
 +	add	%g5,%g2,%g2		! +=X[29]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+104]
 +	add	%o2,%g2,%g2
 +	srlx	%o7,14,%o2	!! 29
 +	xor	%o0,%o1,%g5
 +	sllx	%o7,23,%g4
 +	and	%o7,%g5,%g5
 +	srlx	%o7,18,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,46,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o7,41,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o7,50,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%o1,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o2,%g3		! Sigma1(e)
 +
 +	srlx	%o3,28,%o2
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+232],%g5	! K[29]
 +	sllx	%o3,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o3,34,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,30,%g4
 +	xor	%g3,%o2,%o2
 +	srlx	%o3,39,%g3
 +	xor	%g4,%o2,%o2
 +	sllx	%o3,36,%g4
 +	xor	%g3,%o2,%o2
 +	xor	%g4,%o2,%o2		! Sigma0(a)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%g1,%g1
 -	add	%l2,%g2,%g2
 -	srl	%l7,6,%l2	!! 29
 -	xor	%l0,%l1,%g5
 -	sll	%l7,7,%g4
 -	and	%l7,%g5,%g5
 -	srl	%l7,11,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,21,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l7,25,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l7,26,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%l1,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l2,%g3		! Sigma1(e)
 -
 -	srl	%l3,2,%l2
 -	add	%g5,%g2,%g2
 -	ld	[%i3+116],%g5	! K[29]
 -	sll	%l3,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l3,13,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,19,%g4
 -	xor	%g3,%l2,%l2
 -	srl	%l3,22,%g3
 -	xor	%g4,%l2,%l2
 -	sll	%l3,30,%g4
 -	xor	%g3,%l2,%l2
 -	xor	%g4,%l2,%l2		! Sigma0(a)
 -
 -	or	%l3,%l4,%g3
 -	and	%l3,%l4,%g4
 -	and	%l5,%g3,%g3
 +	or	%o3,%o4,%g3
 +	and	%o3,%o4,%g4
 +	and	%o5,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[29]
 -	add	%g4,%l2,%l2
 +	add	%g4,%o2,%o2
  
 -	add	%g2,%l6,%l6
 -	add	%g2,%l2,%l2
 -	srl	%o7,3,%g2		!! Xupdate(30)
 -	sll	%o7,14,%g4
 -	srl	%o7,7,%g3
 +	add	%g2,%g1,%g1
 +	add	%g2,%o2,%o2
 +	sllx	%l2,32,%g3		!! Xupdate(30)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+0],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+4],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%o7,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srlx	%g1,32,%i5
 -	srl	%i5,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%i5,13,%g4
 -	srl	%i5,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%i5,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o7,32,%g4		! X[i]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	add	%o3,%g2,%g2			! +=X[i+9]
 -	add	%g5,%g4,%g4
 -	srl	%o7,0,%o7
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[30+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+104],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+108],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[30+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+64],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+68],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+120],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[30+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+124],%l1
 +	add	%g5,%g2,%g2		! +=X[30]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+112]
 +	add	%o1,%g2,%g2
 +	srlx	%g1,14,%o1	!! 30
 +	xor	%o7,%o0,%g5
 +	sllx	%g1,23,%g4
 +	and	%g1,%g5,%g5
 +	srlx	%g1,18,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,46,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%g1,41,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%g1,50,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%o0,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o1,%g3		! Sigma1(e)
  
 -	sllx	%g2,32,%g3
 -	or	%g3,%o7,%o7
 -	add	%l1,%g2,%g2
 -	srl	%l6,6,%l1	!! 30
 -	xor	%l7,%l0,%g5
 -	sll	%l6,7,%g4
 -	and	%l6,%g5,%g5
 -	srl	%l6,11,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,21,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l6,25,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l6,26,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%l0,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l1,%g3		! Sigma1(e)
 -
 -	srl	%l2,2,%l1
 -	add	%g5,%g2,%g2
 -	ld	[%i3+120],%g5	! K[30]
 -	sll	%l2,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l2,13,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,19,%g4
 -	xor	%g3,%l1,%l1
 -	srl	%l2,22,%g3
 -	xor	%g4,%l1,%l1
 -	sll	%l2,30,%g4
 -	xor	%g3,%l1,%l1
 -	xor	%g4,%l1,%l1		! Sigma0(a)
 -
 -	or	%l2,%l3,%g3
 -	and	%l2,%l3,%g4
 -	and	%l4,%g3,%g3
 +	srlx	%o2,28,%o1
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+240],%g5	! K[30]
 +	sllx	%o2,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o2,34,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,30,%g4
 +	xor	%g3,%o1,%o1
 +	srlx	%o2,39,%g3
 +	xor	%g4,%o1,%o1
 +	sllx	%o2,36,%g4
 +	xor	%g3,%o1,%o1
 +	xor	%g4,%o1,%o1		! Sigma0(a)
 +
 +	or	%o2,%o3,%g3
 +	and	%o2,%o3,%g4
 +	and	%o4,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[30]
 -	add	%g4,%l1,%l1
 +	add	%g4,%o1,%o1
  
 -	add	%g2,%l5,%l5
 -	add	%g2,%l1,%l1
 -	srlx	%o0,32,%i5
 -	srl	%i5,3,%g2		!! Xupdate(31)
 -	sll	%i5,14,%g4
 -	srl	%i5,7,%g3
 +	add	%g2,%o5,%o5
 +	add	%g2,%o1,%o1
 +	sllx	%l2,32,%g3		!! Xupdate(31)
 +	or	%l3,%g3,%g3
 +
 +	srlx	%g3,7,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+8],%l2
 +	sllx	%g3,56,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+12],%l3
 +	srlx	%g3,1,%g3
  	xor	%g4,%g2,%g2
 -	sll	%g4,11,%g4
 +	sllx	%g4,7,%g4
  	xor	%g3,%g2,%g2
 -	srl	%i5,18,%g3
 +	srlx	%g3,7,%g3
  	xor	%g4,%g2,%g2
 -	srl	%g1,10,%g5
 -	xor	%g3,%g2,%g2			! T1=sigma0(X[i+1])
 -	sll	%g1,13,%g4
 -	srl	%g1,17,%g3
 -	xor	%g4,%g5,%g5
 -	sll	%g4,2,%g4
 -	xor	%g3,%g5,%g5
 -	srl	%g1,19,%g3
 -	xor	%g4,%g5,%g5
 -	srlx	%o4,32,%g4	! X[i+9]
 -	xor	%g3,%g5,%g5		! sigma1(X[i+14])
 -	srl	%o7,0,%g3
 -	add	%g5,%g4,%g4
 -	add	%o7,%g2,%g2			! +=X[i]
 -	xor	%g3,%o7,%o7
 +	sllx	%l6,32,%g5
 +	xor	%g3,%g2,%g2		! sigma0(X[31+1])
 +	or	%l7,%g5,%g5
 +
 +	srlx	%g5,6,%g4
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+112],%l6
 +	sllx	%g5,3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+116],%l7
 +	srlx	%g5,19,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%g3,42,%g3
 +	xor	%g5,%g4,%g4
 +	srlx	%g5,42,%g5
 +	xor	%g3,%g4,%g4
 +	sllx	%l4,32,%g3
 +	xor	%g5,%g4,%g4	! sigma1(X[31+14])
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+72],%l4
 +	or	%l5,%g3,%g3
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+76],%l5
 +
 +	sllx	%l0,32,%g5
  	add	%g4,%g2,%g2
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+0],%l0
 +	or	%l1,%g5,%g5
 +	add	%g3,%g2,%g2		! +=X[31+9]
 +	ld	[%sp+STACK_BIAS+STACK_FRAME+4],%l1
 +	add	%g5,%g2,%g2		! +=X[31]
 +	stx	%g2,[%sp+STACK_BIAS+STACK_FRAME+120]
 +	add	%o0,%g2,%g2
 +	srlx	%o5,14,%o0	!! 31
 +	xor	%g1,%o7,%g5
 +	sllx	%o5,23,%g4
 +	and	%o5,%g5,%g5
 +	srlx	%o5,18,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,46,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o5,41,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o5,50,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%o7,%g5,%g5		! Ch(e,f,g)
 +	xor	%g4,%o0,%g3		! Sigma1(e)
  
 -	srl	%g2,0,%g2
 -	or	%g2,%o7,%o7
 -	add	%l0,%g2,%g2
 -	srl	%l5,6,%l0	!! 31
 -	xor	%l6,%l7,%g5
 -	sll	%l5,7,%g4
 -	and	%l5,%g5,%g5
 -	srl	%l5,11,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,21,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l5,25,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l5,26,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%l7,%g5,%g5		! Ch(e,f,g)
 -	xor	%g4,%l0,%g3		! Sigma1(e)
 -
 -	srl	%l1,2,%l0
 -	add	%g5,%g2,%g2
 -	ld	[%i3+124],%g5	! K[31]
 -	sll	%l1,10,%g4
 -	add	%g3,%g2,%g2
 -	srl	%l1,13,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,19,%g4
 -	xor	%g3,%l0,%l0
 -	srl	%l1,22,%g3
 -	xor	%g4,%l0,%l0
 -	sll	%l1,30,%g4
 -	xor	%g3,%l0,%l0
 -	xor	%g4,%l0,%l0		! Sigma0(a)
 -
 -	or	%l1,%l2,%g3
 -	and	%l1,%l2,%g4
 -	and	%l3,%g3,%g3
 +	srlx	%o1,28,%o0
 +	add	%g5,%g2,%g2
 +	ldx	[%i3+248],%g5	! K[31]
 +	sllx	%o1,25,%g4
 +	add	%g3,%g2,%g2
 +	srlx	%o1,34,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,30,%g4
 +	xor	%g3,%o0,%o0
 +	srlx	%o1,39,%g3
 +	xor	%g4,%o0,%o0
 +	sllx	%o1,36,%g4
 +	xor	%g3,%o0,%o0
 +	xor	%g4,%o0,%o0		! Sigma0(a)
 +
 +	or	%o1,%o2,%g3
 +	and	%o1,%o2,%g4
 +	and	%o3,%g3,%g3
  	or	%g3,%g4,%g4	! Maj(a,b,c)
  	add	%g5,%g2,%g2		! +=K[31]
 -	add	%g4,%l0,%l0
 +	add	%g4,%o0,%o0
  
 -	add	%g2,%l4,%l4
 -	add	%g2,%l0,%l0
 +	add	%g2,%o4,%o4
 +	add	%g2,%o0,%o0
  	and	%g5,0xfff,%g5
 -	cmp	%g5,2290
 +	cmp	%g5,2071
  	bne	.L16_xx
 -	add	%i3,64,%i3	! Ktbl+=16
 +	add	%i3,128,%i3	! Ktbl+=16
  
 -	ld	[%i0+0],%o0
 -	ld	[%i0+4],%o1
 -	ld	[%i0+8],%o2
 -	ld	[%i0+12],%o3
 -	ld	[%i0+16],%o4
 -	ld	[%i0+20],%o5
 -	ld	[%i0+24],%g1
 -	ld	[%i0+28],%o7
 -
 -	add	%l0,%o0,%l0
 -	st	%l0,[%i0+0]
 -	add	%l1,%o1,%l1
 -	st	%l1,[%i0+4]
 -	add	%l2,%o2,%l2
 -	st	%l2,[%i0+8]
 -	add	%l3,%o3,%l3
 -	st	%l3,[%i0+12]
 -	add	%l4,%o4,%l4
 -	st	%l4,[%i0+16]
 -	add	%l5,%o5,%l5
 -	st	%l5,[%i0+20]
 -	add	%l6,%g1,%l6
 -	st	%l6,[%i0+24]
 -	add	%l7,%o7,%l7
 -	st	%l7,[%i0+28]
 -	add	%i1,64,%i1		! advance inp
 +	ld	[%i0+0],%l0
 +	ld	[%i0+4],%l1
 +	ld	[%i0+8],%l2
 +	ld	[%i0+12],%l3
 +	ld	[%i0+16],%l4
 +	ld	[%i0+20],%l5
 +	ld	[%i0+24],%l6
 +
 +	sllx	%l0,32,%g3
 +	ld	[%i0+28],%l7
 +	sllx	%l2,32,%g4
 +	or	%l1,%g3,%g3
 +	or	%l3,%g4,%g4
 +	add	%g3,%o0,%o0
 +	add	%g4,%o1,%o1
 +	stx	%o0,[%i0+0]
 +	sllx	%l4,32,%g5
 +	stx	%o1,[%i0+8]
 +	sllx	%l6,32,%g2
 +	or	%l5,%g5,%g5
 +	or	%l7,%g2,%g2
 +	add	%g5,%o2,%o2
 +	stx	%o2,[%i0+16]
 +	add	%g2,%o3,%o3
 +	stx	%o3,[%i0+24]
 +
 +	ld	[%i0+32],%l0
 +	ld	[%i0+36],%l1
 +	ld	[%i0+40],%l2
 +	ld	[%i0+44],%l3
 +	ld	[%i0+48],%l4
 +	ld	[%i0+52],%l5
 +	ld	[%i0+56],%l6
 +
 +	sllx	%l0,32,%g3
 +	ld	[%i0+60],%l7
 +	sllx	%l2,32,%g4
 +	or	%l1,%g3,%g3
 +	or	%l3,%g4,%g4
 +	add	%g3,%o4,%o4
 +	add	%g4,%o5,%o5
 +	stx	%o4,[%i0+32]
 +	sllx	%l4,32,%g5
 +	stx	%o5,[%i0+40]
 +	sllx	%l6,32,%g2
 +	or	%l5,%g5,%g5
 +	or	%l7,%g2,%g2
 +	add	%g5,%g1,%g1
 +	stx	%g1,[%i0+48]
 +	add	%g2,%o7,%o7
 +	stx	%o7,[%i0+56]
 +	add	%i1,128,%i1		! advance inp
  	cmp	%i1,%i2
  	bne	SIZE_T_CC,.Lloop
 -	sub	%i3,192,%i3	! rewind Ktbl
 +	sub	%i3,512,%i3	! rewind Ktbl
  
  	ret
  	restore
 -.type	sha256_block_data_order,#function
 -.size	sha256_block_data_order,(.-sha256_block_data_order)
 -.asciz	"SHA256 block transform for SPARCv9, CRYPTOGAMS by <appro%openssl.org@localhost>"
 +.type	sha512_block_data_order,#function
 +.size	sha512_block_data_order,(.-sha512_block_data_order)
 +.asciz	"SHA512 block transform for SPARCv9, CRYPTOGAMS by <appro%openssl.org@localhost>"
  .align	4
 Index: usr.bin/dc/bcode.c
 ===================================================================
 RCS file: /cvsroot/src/usr.bin/dc/bcode.c,v
 retrieving revision 1.3
 diff -u -p -r1.3 bcode.c
 --- usr.bin/dc/bcode.c	6 Feb 2018 17:58:19 -0000	1.3
 +++ usr.bin/dc/bcode.c	26 Jun 2023 11:16:44 -0000
 @@ -338,7 +338,7 @@ max(u_int a, u_int b)
  	return a > b ? a : b;
  }
  
 -static unsigned long factors[] = {
 +static BN_ULONG factors[] = {
  	0, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
  	100000000, 1000000000
  };
 @@ -370,7 +370,7 @@ scale_number(BIGNUM *n, int s)
  		bn_checkp(ctx);
  
  		bn_check(BN_set_word(a, 10));
 -		bn_check(BN_set_word(p, abs_scale));
 +		bn_check(BN_set_word(p, (BN_ULONG)abs_scale));
  		bn_check(BN_exp(a, a, p, ctx));
  		if (s > 0)
  			bn_check(BN_mul(n, n, a, ctx));
 @@ -394,7 +394,7 @@ split_number(const struct number *n, BIG
  	else if (n->scale < sizeof(factors)/sizeof(factors[0])) {
  		rem = BN_div_word(i, factors[n->scale]);
  		if (f != NULL)
 -			bn_check(BN_set_word(f, rem));
 +			bn_check(BN_set_word(f, (BN_ULONG)rem));
  	} else {
  		BIGNUM *a, *p;
  		BN_CTX *ctx;
 @@ -663,7 +663,7 @@ stackdepth(void)
  
  	i = stack_size(&bmachine.stack);
  	n = new_number();
 -	bn_check(BN_set_word(n->number, i));
 +	bn_check(BN_set_word(n->number, (BN_ULONG)i));
  	push_number(n);
  }
  
 @@ -732,12 +732,12 @@ num_digits(void)
  		case BCODE_NUMBER:
  			digits = count_digits(value->u.num);
  			n = new_number();
 -			bn_check(BN_set_word(n->number, digits));
 +			bn_check(BN_set_word(n->number, (BN_ULONG)digits));
  			break;
  		case BCODE_STRING:
  			digits = strlen(value->u.string);
  			n = new_number();
 -			bn_check(BN_set_word(n->number, digits));
 +			bn_check(BN_set_word(n->number, (BN_ULONG)digits));
  			break;
  		}
  		stack_free_value(value);
 



Home | Main Index | Thread Index | Old Index