Commit Graph

2603 Commits

Author SHA1 Message Date
Merry
ec3c597591 backend/arm64: Implement LeastSignificantByte 2022-10-18 15:04:30 +01:00
Merry
a33d186fea backend/arm64: Implement LeastSignificantHalf 2022-10-18 15:04:30 +01:00
Merry
163ed9b185 backend/arm64: Implement LeastSignificantWord 2022-10-18 15:04:30 +01:00
Merry
7c86b06233 backend/arm64: Implement Pack2x64To1x128 2022-10-18 15:04:30 +01:00
Merry
98806139a5 backend/arm64/reg_alloc: Argument HostLoc location 2022-10-18 15:04:30 +01:00
Merry
fe4e864e4c backend/arm64: Implement Pack2x32To1x64 2022-10-18 15:04:30 +01:00
Merry
ff9b92c791 backend/arm64: Implement NZCVFromPackedFlags 2022-10-18 15:04:30 +01:00
Merry
7ea97f7629 backend/arm64: Implement GetLowerFromOp 2022-10-18 15:04:30 +01:00
Merry
92026a456a backend/arm64: Implement GetUpperFromOp 2022-10-18 15:04:30 +01:00
Merry
8c4ea10a38 backend/arm64: Implement GetNZCVFromOp 2022-10-18 15:04:30 +01:00
Merry
e34749336a backend/arm64: Implement GetGEFromOp 2022-10-18 15:04:30 +01:00
Merry
fbcbc1d90d backend/arm64: Implement GetOverflowFromOp 2022-10-18 15:04:30 +01:00
Merry
fb3b828158 backend/arm64: Implement Identity 2022-10-18 15:04:30 +01:00
Merry
97ba8a0f14 backend/arm64: Implement Void 2022-10-18 15:04:30 +01:00
Merry
2a24bb2c1e backend/arm64: Implement Breakpoint 2022-10-18 15:04:30 +01:00
Merry
3a11467220 backend/arm64: Stub all IR instruction implementations 2022-10-18 15:04:30 +01:00
Merry
402abf5ea3 backend/arm64: Implement A32GetExtendedRegister 2022-10-18 15:04:30 +01:00
Merry
84cad9f831 backend/arm64: Implement A32SetCheckBit 2022-10-18 15:04:30 +01:00
Merry
52a46d841b backend/arm64: Implement A32BXWritePC 2022-10-18 15:04:30 +01:00
Merry
67dc7f2e4e backend/arm64: Implement A32UpdateUpperLocationDescriptor 2022-10-18 15:04:30 +01:00
Merry
00ad84b7ab backend/arm64: Initial implementation of terminals 2022-10-18 15:04:30 +01:00
Merry
80c89401b9 a32_address_space: Add StackLayout to stack 2022-10-18 15:04:30 +01:00
Merry
9b2391ec7b backend/arm64/reg_alloc: Implement AssertNoMoreUses 2022-10-18 15:04:30 +01:00
Merry
8e6467bf45 backend/arm64/reg_alloc: Add flag handling 2022-10-18 15:04:30 +01:00
Merry
77436bbbbb backend/arm64: Toy implementation of enough to execute LSLS 2022-10-18 15:04:30 +01:00
Merry
7e046357ff backend/arm64: Initial implementation of register allocator 2022-10-18 15:04:30 +01:00
Merry
3bf2b0aba9 backend/arm64: Adjust how relocations are stored 2022-10-18 15:04:30 +01:00
Merry
e0f091b6a6 backend/arm64: void* -> CodePtr 2022-10-18 15:04:30 +01:00
Merry
f6e80f1e0e backend/arm64: First dummy code execution 2022-10-18 15:04:30 +01:00
Merry
d877777c50 backend/arm64: Initial framework 2022-10-18 15:04:30 +01:00
Wunkolo
e886bfb7c1 backend/x64: Fix FixupLUT argument order
The last two arguments(fixup response response for finite values) are
neg-pos, not pos-neg. Found this out while re-using this function for
some math stuff. Thankfully nothing currently uses this fixup response
at the moment.
2022-09-30 23:10:21 +01:00
Merry
af51845a53 decoder_detail: Workaround #708 2022-09-02 21:16:43 +01:00
Bart Ribbers
e49fee0ca1 block_of_code: rename PAGE_SIZE to DYNARMIC_PAGE_SIZE to prevent use of reserved name
PAGE_SIZE is a kernel symbol and depending on the libc in use, it will
"leak". In this case dynarmic was using it's own PAGE_SIZE and in
combination with the Musl libc the compiler would complain it was overwriting
the kernel symbol
2022-08-25 23:32:18 +01:00
Merry
bf422a190a decoder_detail: Simplify DYNARMIC_DECODER_GET_MATCHER 2022-08-21 18:22:14 +01:00
Merry
c60fd3f0ac block_of_code: Fix running under Rosetta
Rosetta doesn't have accurate emulation of the sahf instruction
2022-08-05 23:43:01 +01:00
Merry
a38966a874 block_of_code: Extract flag loading into a function
LoadRequiredFlagsForCondFromRax
2022-08-05 23:42:19 +01:00
Merry
d7bd5bb7a7 emit_x64: Use movzx(eax, ah) instead of emitting byte equivalent
Emission fixed in xbyak v6.61
2022-07-31 17:52:35 +01:00
Merry
f33c6f062b Revert "block_of_code: Refactor MConst to Xmm{B}Const"
This reverts commit 5d9b720189.

Obscure bugs resulting from this commit due to assumptions regarding zero-extension of higher bits.
2022-07-27 20:31:08 +01:00
Merry
fbdcfeab99 emit_x64_packed: Do not use XmmBConst here
Broadcasting is inappropriate
2022-07-27 20:14:49 +01:00
Merry
1f51dceb60 Update for fmt 9.0.0 2022-07-26 11:20:47 +01:00
Merry
82d71b850e a32_emit_x64: Bugfix for A32GetCpsr for non-FastBMI2
Incorrect loading of E and T flags
2022-07-26 10:44:30 +01:00
Merry
a2b3199adf Convert NZCV to C flag where able 2022-07-23 11:46:07 +01:00
Merry
6bcc424e1a emit_x64_vector: Ensure FPSR.QC is set even if output is invalidated 2022-07-20 19:44:39 +01:00
Merry
34cb465fc7 translate_thumb: IsThumb16: Mask not required 2022-07-20 17:34:31 +01:00
Merry
72c87d11e4 a32_get_set_elimination_pass: Correct insertion point 2022-07-20 16:53:48 +01:00
Merry
da2b1c5724 a32_get_set_elimination_pass: Convert NZ to NZC 2022-07-20 16:45:14 +01:00
Merry
6f106602ba a32_get_set_elimination_pass: Add option to disable NZC -> NZ conversion 2022-07-20 16:42:39 +01:00
Merry
52aa68c31c backend/x64: Fixup NZ flag emission 2022-07-20 14:58:28 +01:00
Merry
b97147e187 a32_get_set_elimination_pass: Reduce NZC to 00C 2022-07-20 14:44:33 +01:00
Merry
03dcc3fa50 a32_get_set_elimination_pass: Reduce NZC to NZ where possible 2022-07-20 14:08:41 +01:00
Merry
cf08130f2c A32: Condense flag handling
Remove individual flag handlers, and handle them in chuks where able, to produce more optimal code.
2022-07-19 22:05:13 +01:00
Merry
2e1ab36240 microinstruction: Also track MostSignificantBit and IsZero{32,64} as pseudoops 2022-07-19 22:02:56 +01:00
Merry
ac19912fe7 microinstruction: Optimize storage of associated pseudooperation 2022-07-19 22:02:18 +01:00
Merry
51a89dbb7a A64CallbackConfigPass: Ensure IR instructions emitted by this pass have correct location descriptors attached 2022-07-17 22:42:56 +01:00
Merry
da5d06c32a backend/x64: Remove unused member halt_requested from StackLayout 2022-07-15 15:19:01 +01:00
Merry
840982be95 block_of_code: Remove far code machinery 2022-07-14 08:58:00 +01:00
Merry
dd60f4b7d8 emit_x64_memory: Use deferred emits 2022-07-14 08:58:00 +01:00
Merry
0d1e4fc4a8 a32_emit_x64: Remove use of far code from EmitTerminalImpl LinkBlock 2022-07-14 08:58:00 +01:00
Merry
36f6114559 emit_x64_vector_floating_point: Use deferred emits 2022-07-14 08:58:00 +01:00
Merry
7d5e078baa emit_x64_floating_point: MSVC fixup 2022-07-14 08:58:00 +01:00
Merry
11ba75b7f0 emit_x64_floating_point: Use deferred emits 2022-07-14 08:58:00 +01:00
Merry
6c38ed8a89 emit_x86: Introduce the concept of deferred emits
Remove the concept of the far code region
2022-07-14 08:58:00 +01:00
Merry
b6ddeeea0f Implement memory aborts 2022-07-13 12:38:03 +01:00
Merry
285e617e35 Revert "frontend: Add option to halt after memory accesses (#682)"
This reverts commit 5ad1d02351.
2022-07-13 12:34:37 +01:00
Merry
7016ace72b llvm_disassemble: Add hex output 2022-07-12 19:20:25 +01:00
Merry
cd85b7fdaa emit_x64: Fix bugs in fast dispatcher
* We failed to invalidate entries if there are no patches required for a location descriptor.
* Bug in A64 hashing code (rbx instead of rbp).
* Bug in A32 and A64 lookup code (inconsistent choice of key: PC vs IR::LocationDescriptor).
* Test case added.
2022-07-11 16:06:54 +01:00
Wunkolo
a5318c775c constant_pool: Use std::span to manage pool
Simplifies some raw pointer arithmetic and type-usage into the new
`ConstantT` type.
2022-07-07 23:46:21 +01:00
Wunkolo
5d9b720189 block_of_code: Refactor MConst to Xmm{B}Const
`MConst` is refactored into `XmmConst` to clearly communicate the
addressable space of the newly allocated 16-byte memory constant.
`GetVectorOf` is elevated into a globally available `XmmBConst` function
that "broadcasts" bits of the input-value into n-bit elements that span
the width of the Xmm-constant.

`emit_x64_floating_point` will utilize the same 16-byte
broadcasted-constants to encourage more cache-hits within the
constant-pool between vector and non-vector code.
2022-07-07 23:46:05 +01:00
Liam
02c8b434c7 interface: allow clear of previously-signaled halt 2022-07-07 23:45:09 +01:00
Wunkolo
4d78d167d6 emit_x64_{vector_}floating_point: Add AVX512 implementation for ForceToDefaultNaN
`vfpclassp* k, xmm, i8` has better latency(4->3) and allocates better
execution ports(01->5) that are out of the way of ALU-ports than
`vcmpunordp* xmm, xmm, xmm`(`vcmpp* xmm, xmm, xmm, i8`) and removes the
pipeline dependency on `xmm0` in favor AVX512 `k`-mask registers.

`vblendmp* xmm, k, xmm, mem` is about the same throughput and latency as
`blendvp* xmm. mem` but has the benefit of embedded broadcasts to reduce
memory bandwidth(32/64-bit read rather than 128-bit) and lends itself to
a future size optimization feature of `constant_pool`.
2022-06-22 00:08:49 +01:00
Wunkolo
6367a26e62 emit_x64_{vector_}floating_point: Add AVX512 implementation for DenormalsAreZero
Both single and double precision floating point numbers as well as the
packed and unpacked version of this instruction will be able to use the
same memory constant. This takes advantage of the fact that `VFIXUPIMM*`
doesn't just copy from the source, but it will convert to `0.0` if it
turns out that it is a denormal and the `MXCSR.DAZ` flag is set.

```
tsrc[31:0]←((src1[30:23] = 0) AND (MXCSR.DAZ =1)) ? 0.0 : src1[31:0]
...
CASE(token_response[3:0]) {
    ...
    0001: dest[31:0]←tsrc[31:0]; ; pass through src1 normal input value, denormal as zero
    ...
```
2022-06-22 00:08:14 +01:00
Wunkolo
3ed2aebb20 backend/x64: Update FpFixup constants with denormal behavior
There is an important subtlety that should be documented here. All the
operands of `FpFixup` that read from the `Src` register actually do a
`DAZ` operation if `MXCSR.DAZ` is set.
2022-06-22 00:08:14 +01:00
Merry
d40557b751 A32/A64: Allow std::nullopt from MemoryReadCode
Raise a fault at runtime if this block is executed
2022-06-21 21:41:27 +01:00
liamwhite
5ad1d02351
frontend: Add option to halt after memory accesses (#682)
Intended to be used for library users wishing implement accurate memory watchpoints.

* A32: optionally make memory instructions the end of basic blocks
* A64: optionally make memory instructions the end of basic blocks
* Make memory halt checking a user configurable
* oops
2022-06-16 18:09:04 +01:00
SachinVin
46989efc2b asimd_one_reg_modified_immediate.cpp: Rename mvn to mvn_ 2022-05-28 13:27:14 +01:00
Merry
e44ac5b84c CMakeLists: Allow building on arm64 2022-05-28 13:27:14 +01:00
Merry
2779f24862 emit_x64_packed: Optimize GE flag generation for signed packed add/sub
sum >= 0 is equivalent to sum > -1
2022-05-17 23:50:51 +01:00
Merry
b224fad171 emit_x64_vector_floating_point: Implement workaround for issue 678 2022-05-17 21:06:16 +01:00
Merry
b1dc11a32d exception_handler_macos: Avoid use of deprecated function mach_port_destroy 2022-05-17 20:47:13 +01:00
Merry
e007d94133 backend/x64: Use templated lambda in each use of GenerateLookupTableFromList 2022-05-17 20:25:27 +01:00
Merry
57af72a567 CMakeLists: Make mcl a public link dependency 2022-04-19 20:33:26 +01:00
Liam
898f14b772 backend/x64: use mmap for all code allocations on Linux 2022-04-19 18:45:46 +01:00
Merry
78b4ba10c9 Migrate to mcl 2022-04-19 18:05:04 +01:00
Merry
de4154aa18 externals: Remove mp and replace uses with mcl 2022-04-19 16:28:28 +01:00
Wunkolo
27bbf4501b backend/x64: Use upper EVEX registers as scratch space
AVX512 adds an additional **16** simd registers, for a total of 32 simd
registers, accessible by utilizing EVEX encoded instructions. Rather
than using the `ScratchXmm` function, adding additional
register-pressure and spilling, AVX512-enabled contexts can just
directly use `xmm{16-31}` registers as intermediate scratch registers.
2022-04-06 17:41:55 +01:00
merry
644172477e Implement enable_cycle_counting 2022-04-03 16:10:32 +01:00
merry
aac1f6ab1b Implement halt_reason
* Provide reason for halting and atomically update this.
* Allow user to specify a halt reason and return this information on halt.
* Check if halt was requested prior to starting execution.
2022-04-03 15:37:20 +01:00
merry
116297ccd5 common: Add atomic
Implement atomic or operation on u32
2022-04-03 15:30:39 +01:00
merry
f6be6bc14b emit_x64_memory: Appease MSVC
Associated with changes in 8bcd46b7e9
2022-04-02 20:41:34 +01:00
merry
8bcd46b7e9 emit_x64_memory: Ensure 128-bit loads/stores are atomic 2022-04-02 19:33:48 +01:00
merry
e27733464b emit_x64_memory: Always order exclusive accesses 2022-04-02 19:33:15 +01:00
merry
cd91a36613 emit_x64_memory: Fix bug in 16-bit ordered EmitReadMemoryMov 2022-04-02 19:32:46 +01:00
merry
9cadab8fa9 backend/emit_x64_memory: Enforce memory ordering 2022-03-29 20:57:34 +01:00
merry
675efecf47 emit_x64_memory: Combine A32 and A64 memory code 2022-03-29 20:51:50 +01:00
merry
af2d50288f A64/sys_ic: Return to dispatch on possible invalidation 2022-03-27 15:27:34 +01:00
merry
cf0709c7f1 emit_x64_memory: Share Emit{Read,Write}MemoryMove 2022-03-26 16:51:55 +00:00
merry
64adc91ca2 emit_x64_memory: Move EmitFastmemVAddr to common file 2022-03-26 16:49:14 +00:00
merry
18f02e2088 emit_x64_memory: Move EmitVAddrLookup to common file 2022-03-26 16:46:06 +00:00
merry
3d657c450a emit_x64_memory: Share EmitDetectMisalignedVAddr 2022-03-26 16:09:56 +00:00
merry
fb586604b4 emit_x64_memory: Share constants 2022-03-26 16:05:03 +00:00
merry
5cf2d59913 A32: Add AccType information and propagate to IR-level 2022-03-26 15:38:10 +00:00
merry
614ecb7020 A64: Propagate AccType information to IR-level 2022-03-26 15:38:10 +00:00
merry
879f211686 ir/value: Add AccType to Value 2022-03-26 15:38:10 +00:00
Alexandre Bouvier
9d369436d8 cmake: Fix unicorn and llvm 2022-03-22 20:27:01 +00:00
merry
c78b82dd2c vfp: VLDM is UNPREDICABLE when n is R15 in thumb mode 2022-03-20 20:52:11 +00:00
Sergi Granell
0ec4a23710 thumb32: Implement LDA and STL
Note that those are ARMv8 additions to the Thumb instruction set.
2022-03-20 20:16:27 +00:00
merry
e1a266b929 A32: Implement SHA256SU1 2022-03-20 13:59:18 +00:00
merry
ab4c6cfefb A32: Implement SHA256SU0 2022-03-20 13:59:18 +00:00
merry
c022a778d6 A32: Implement SHA256H, SHA256H2 2022-03-20 13:59:18 +00:00
merry
bb713194a0 backend/x64: Implement SHA256 polyfills 2022-03-20 13:59:18 +00:00
merry
98cff8dd0d IR: Implement SHA256MessageSchedule{0,1} 2022-03-20 13:59:18 +00:00
merry
f0a4bf1f6a IR: Implement SHA256Hash 2022-03-20 13:59:18 +00:00
merry
a4daad6336 block_of_code: Add HostFeature SHA 2022-03-20 00:13:03 +00:00
Merry
bcfe377aaa x64/reg_alloc: More zero extension paranoia 2022-03-06 12:24:50 +00:00
Merry
316b95bb3f {a32,a64}_emit_x64_memory: Zero extension paranoia 2022-03-06 12:10:40 +00:00
Merry
0fd32c5fa4 a64_emit_x64_memory: Fix bug in 128 bit exclusive write fallback 2022-02-28 19:53:43 +00:00
merry
5ea2b49ef0
backend/x64: Inline exclusive memory access operations (#664)
* a64_emit_x64_memory: Add Unsafe_IgnoreGlobalMonitor optimization

* a32_emit_x64_memory: Add Unsafe_IgnoreGlobalMonitor optimization

* a32_emit_x64_memory: Remove dead code

* {a32,a64}_emit_x64_memory: Also verify vaddr in Exclusive{Read,Write}MemoryInlineUnsafe

* a64_emit_x64_memory: Full fallback for ExclusiveWriteMemoryInlineUnsafe

* a64_emit_x64_memory: Inline full locking

* a64_emit_x64_memory: Allow inlined locking to be optionally removed

* spin_lock: Use xbyak instead of inline asm

* a64_emit_x64_memory: Recompile on exclusive fastmem failure

* Avoid variable shadowing

* a32_emit_x64_memory: Implement recompilation

* Fix recompilation

* spin_lock: Clang format fix

* fix fallback function calls
2022-02-28 08:13:10 +00:00
merry
0a11e79b55 backend/x64: Ensure all HostCalls are appropriately zero-extended 2022-02-27 20:04:44 +00:00
merry
6c4fa780e0 {a32,a64}_emit_x64_memory: Ensure return value of fastmem callback are zero-extended 2022-02-27 19:58:23 +00:00
merry
593de127d2 a64_emit_x64: Clear fastmem patch information on ClearCache 2022-02-27 19:50:05 +00:00
Merry
c90173151e backend/x64: Split off memory emitters 2022-02-26 21:25:09 +00:00
Merry
19a423034e block_of_code: Fix inaccurate size reporting in SpaceRemaining
Typo: getCode should be getCurr: Instead of comparing against the current pointer,
we were incorrectly comparing against the start of memory
2022-02-26 16:09:11 +00:00
Merry
ea08a389b4 emit_x64_floating_point: EmitFPToFixed: No need to round if rounding_mode == TowardsZero
cvttsd2si truncates during operation
2022-02-23 20:44:02 +00:00
merry
b34214f953 emit_x64_floating_point: Improve EmitFPToFixed codegen 2022-02-23 19:42:15 +00:00
merry
5fe274f510 emit_x64_floating_point: Deinterlace 64-bit FPToFixed signed/unsigned codepaths 2022-02-23 19:14:41 +00:00
merry
b8dd1c7510 emit_x64_floating_point: Correct dead-code warning in MSVC 2019 2022-02-12 22:07:26 +00:00
merry
95a1ebfb97 backend/x64: Bugfix: A32 frontent also uses FPSCR.QC 2022-02-12 21:46:45 +00:00
Fernando Sahmkow
a8cbfd9af4 X86_Backend: set fences correctly for memory barriers and synchronization. 2022-02-01 14:27:54 +00:00
liushuyu
40afbe1927
disassembler_thumb: fix formatting issues with fmt 8.1.x ...
... fmt 8.1.0 added more formatting checks and Cond can't be formatted
directly now
2022-01-05 21:49:51 -07:00
Wunkolo
ad5465d6ce constant_pool: Use tsl::robin_map rather than unordered_map
Finding a much more drastic improvement with `robin_map`.

`map`:
```
[master] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     567.0 ms ±   6.9 ms    [User: 513.1 ms, System: 53.2 ms]
  Range (min … max):   554.4 ms … 588.1 ms    100 runs
```

`unordered_map`:
```
[opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     561.1 ms ±   4.5 ms    [User: 508.1 ms, System: 52.3 ms]
  Range (min … max):   552.6 ms … 574.2 ms    100 runs
```

`tsl::robin_map`:
```
[opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     553.5 ms ±   5.6 ms    [User: 500.7 ms, System: 52.1 ms]
  Range (min … max):   545.7 ms … 569.3 ms    100 runs
```
2022-01-01 12:13:13 +00:00
Wunkolo
e57bb0569a constant_pool: Convert hashtype from tuple to pair 2022-01-01 12:13:13 +00:00
Wunkolo
befc22a61e constant_pool: Use unordered_map rather than map
`map` is an ordinal structure with log(n) time searches.
`unordered_map` uses O(1) average-time searches and O(n) in the worst
case where a bucket has a to a colliding hash and has to start chaining.
The unordered version should speed up our general-case when looking up
constants.

I've added a trivial order-dependent(_(0,1) and (1,0) will return a
different hash_) hash to combine a 128-bit constant into a
64-bit hash that generally will not collide, using a bit-rotate to
preserve entropy.
2022-01-01 12:13:13 +00:00
Morph
28714ee75a general: Rename files with duplicate names
In MSVC, having files with identical filenames will result into massive slowdowns when compiling.
The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h
2021-12-23 11:38:58 +00:00
Andrea Pappacoda
4dcebc1822 build(cmake): add install target
This makes dynarmic installable, and also adds a CMake package config
file, that allows projects to use `find_package(dynarmic)` to import the
library.

I know #636 adds the same thing, but while experimenting with the
different install options in
https://github.com/merryhime/dynarmic/pull/636#discussion_r725656034
I ended up with a working patch, so I'm proposing this as well. This
implements solution 2.
2021-10-30 19:03:23 +01:00
Andrea Pappacoda
b87a889d98 build(cmake): add version and soversion to the library
This adds versioning information to the built library.

When building the shared library on Linux systems, a new object will
be created: libdynarmic.so.5

This is really useful when talking about ABI compatibility.

The variables dynarmic_VERSION and dynarmic_VERSION_MAJOR
are implicitly created when calling project(dynarmic VERSION x.y.z)
2021-10-11 06:53:05 +01:00
Fernando S
e4146ec3a1
x64 Interface: Allow for asynchronous invalidation (#647)
* x64 Interface: Make Invalidation asynchronous.

* Apply suggestions from code review
2021-10-05 15:06:41 +01:00
Wunkolo
5e7d2afe0f IR: Introduce VectorReduceAdd{8,16,32,64} opcode
Adds all elements of vector and puts the result into the lowest element.
Accelerates the `addv` instruction into a vectorized implementation
rather than a serial one.
2021-09-27 19:54:11 +01:00
Marshall Mohror
0b8fd755d8 Fix signal_stack_size for glibc 2.34
`SIGSTKSZ` is now defined as `sysconf(_SC_SIGSTKSZ)` which is not constexpr, and returns a long which throws off the `std::max` template deduction.
2021-09-22 20:38:11 +01:00
Ben
6ce8bfaf32
Add API function to retrieve dissassembly as vector of strings (#644)
Co-authored-by: ben <Avuxo@users.noreply.github.com>
2021-09-16 16:45:20 -04:00
Merry
517e35f845 decoder_detail: Avoid MSVC ICE
MSVC has an internal compiler error when assume is present in this constexpr function
2021-08-15 19:32:05 +01:00
Merry
2e4f99ae3d CMakeLists: Expose DYNARMIC_IGNORE_ASSERTS option 2021-08-15 16:09:37 +01:00
Merry
4988d9fab3 disassembler_arm: Fix format strings for vfp_VMOV_from_i{8,16} 2021-08-15 15:16:53 +01:00
Merry
615ce8c7c5 IR: Remove A32 IR instructions Get{N,Z,V}Flag 2021-08-12 13:06:15 +01:00
Wunkolo
1e94acff66 ir: Add VectorBroadcastElement{Lower} IR instruction
The lane-splatting variant of `FMUL` and `FMLA` is very
common in instruction streams when implementing things like
matrix multiplication. When used, they are used very densely.

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication

The way this is currently implemented is by grabbing the particular lane
into a general purpose register and then broadcasting it into a simd
register through `VectorGetElement` and `VectorBroadcast`.

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index));
```

What could be done instead is to keep it within
the vector-register and use a permute/shuffle to "splat" the particular
lane across all other lanes, removing the GPR-round-trip.

This is implemented as the new IR instruction `VectorBroadcastElement`:

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index);
```
2021-08-07 23:03:57 +01:00
Wunkolo
46b8cfabc0 bit_util: Protect Replicate from automatic up-casting
Recursive calls to `Replicate` beyond the first call might
cause an unintentional up-casting to an `int` type due
to `|` and `<<` operations on types such as `uint8_t` and `uint16_t`

This makes sure calls such as `Recursive<u8>` stay as the `u8` type
through-out.
2021-08-07 23:03:57 +01:00
Merry
d41bc492fe {a32,a64}_jitstate: Remove unnecessary headers 2021-08-07 19:35:33 +01:00
Merry
07b5734fb0 xbyak: Correct xbyak include directory
xbyak is intended to be installed in /usr/local/include/xbyak.
Since we desire not to install xbyak before using it, we copy the headers
to the appropriate directory structure and use that instead
2021-08-07 15:13:49 +01:00
Merry
59fb568b27 tests: Use Zydis for disassembly 2021-08-06 15:29:43 +01:00
Wunkolo
f33bd69ec2 emit_x64_vector_floating_point: AVX512 implementation of EmitFPVectorToFixed
AVX512 introduces the _unsigned_ variant of float-to-integer conversion
functions via `vcvttp{sd}2u{dq}q`. In the case that a value is not
representable as an unsigned integer, it will result in `0xFFFFF...`
which can be utilized to get "free" saturation when the floating point
value exceeds the unsigned range, after masking away negative values.

https://www.felixcloutier.com/x86/vcvttps2udq
https://www.felixcloutier.com/x86/vcvttpd2uqq

This PR also speeds up the _signed_ conversion function for fp64->int64
https://www.felixcloutier.com/x86/vcvttpd2qq
2021-07-17 22:13:11 +01:00
SachinVin
048da372e9 block_of_code.cpp: remove redundant align() 2021-07-17 22:12:31 +01:00