dynarmic

mirror of https://git.suyu.dev/suyu/dynarmic.git synced 2026-02-15 22:01:03 +00:00

Author	SHA1	Message	Date
Merry	ec3c597591	backend/arm64: Implement LeastSignificantByte	2022-10-18 15:04:30 +01:00
Merry	a33d186fea	backend/arm64: Implement LeastSignificantHalf	2022-10-18 15:04:30 +01:00
Merry	163ed9b185	backend/arm64: Implement LeastSignificantWord	2022-10-18 15:04:30 +01:00
Merry	7c86b06233	backend/arm64: Implement Pack2x64To1x128	2022-10-18 15:04:30 +01:00
Merry	98806139a5	backend/arm64/reg_alloc: Argument HostLoc location	2022-10-18 15:04:30 +01:00
Merry	fe4e864e4c	backend/arm64: Implement Pack2x32To1x64	2022-10-18 15:04:30 +01:00
Merry	ff9b92c791	backend/arm64: Implement NZCVFromPackedFlags	2022-10-18 15:04:30 +01:00
Merry	7ea97f7629	backend/arm64: Implement GetLowerFromOp	2022-10-18 15:04:30 +01:00
Merry	92026a456a	backend/arm64: Implement GetUpperFromOp	2022-10-18 15:04:30 +01:00
Merry	8c4ea10a38	backend/arm64: Implement GetNZCVFromOp	2022-10-18 15:04:30 +01:00
Merry	e34749336a	backend/arm64: Implement GetGEFromOp	2022-10-18 15:04:30 +01:00
Merry	fbcbc1d90d	backend/arm64: Implement GetOverflowFromOp	2022-10-18 15:04:30 +01:00
Merry	fb3b828158	backend/arm64: Implement Identity	2022-10-18 15:04:30 +01:00
Merry	97ba8a0f14	backend/arm64: Implement Void	2022-10-18 15:04:30 +01:00
Merry	2a24bb2c1e	backend/arm64: Implement Breakpoint	2022-10-18 15:04:30 +01:00
Merry	3a11467220	backend/arm64: Stub all IR instruction implementations	2022-10-18 15:04:30 +01:00
Merry	402abf5ea3	backend/arm64: Implement A32GetExtendedRegister	2022-10-18 15:04:30 +01:00
Merry	84cad9f831	backend/arm64: Implement A32SetCheckBit	2022-10-18 15:04:30 +01:00
Merry	52a46d841b	backend/arm64: Implement A32BXWritePC	2022-10-18 15:04:30 +01:00
Merry	67dc7f2e4e	backend/arm64: Implement A32UpdateUpperLocationDescriptor	2022-10-18 15:04:30 +01:00
Merry	00ad84b7ab	backend/arm64: Initial implementation of terminals	2022-10-18 15:04:30 +01:00
Merry	80c89401b9	a32_address_space: Add StackLayout to stack	2022-10-18 15:04:30 +01:00
Merry	9b2391ec7b	backend/arm64/reg_alloc: Implement AssertNoMoreUses	2022-10-18 15:04:30 +01:00
Merry	8e6467bf45	backend/arm64/reg_alloc: Add flag handling	2022-10-18 15:04:30 +01:00
Merry	77436bbbbb	backend/arm64: Toy implementation of enough to execute LSLS	2022-10-18 15:04:30 +01:00
Merry	7e046357ff	backend/arm64: Initial implementation of register allocator	2022-10-18 15:04:30 +01:00
Merry	3bf2b0aba9	backend/arm64: Adjust how relocations are stored	2022-10-18 15:04:30 +01:00
Merry	e0f091b6a6	backend/arm64: void* -> CodePtr	2022-10-18 15:04:30 +01:00
Merry	f6e80f1e0e	backend/arm64: First dummy code execution	2022-10-18 15:04:30 +01:00
Merry	d877777c50	backend/arm64: Initial framework	2022-10-18 15:04:30 +01:00
Wunkolo	e886bfb7c1	backend/x64: Fix `FixupLUT` argument order The last two arguments(fixup response response for finite values) are neg-pos, not pos-neg. Found this out while re-using this function for some math stuff. Thankfully nothing currently uses this fixup response at the moment.	2022-09-30 23:10:21 +01:00
Merry	af51845a53	decoder_detail: Workaround #708	2022-09-02 21:16:43 +01:00
Bart Ribbers	e49fee0ca1	block_of_code: rename PAGE_SIZE to DYNARMIC_PAGE_SIZE to prevent use of reserved name PAGE_SIZE is a kernel symbol and depending on the libc in use, it will "leak". In this case dynarmic was using it's own PAGE_SIZE and in combination with the Musl libc the compiler would complain it was overwriting the kernel symbol	2022-08-25 23:32:18 +01:00
Merry	bf422a190a	decoder_detail: Simplify DYNARMIC_DECODER_GET_MATCHER	2022-08-21 18:22:14 +01:00
Merry	c60fd3f0ac	block_of_code: Fix running under Rosetta Rosetta doesn't have accurate emulation of the sahf instruction	2022-08-05 23:43:01 +01:00
Merry	a38966a874	block_of_code: Extract flag loading into a function LoadRequiredFlagsForCondFromRax	2022-08-05 23:42:19 +01:00
Merry	d7bd5bb7a7	emit_x64: Use movzx(eax, ah) instead of emitting byte equivalent Emission fixed in xbyak v6.61	2022-07-31 17:52:35 +01:00
Merry	f33c6f062b	Revert "block_of_code: Refactor `MConst` to `Xmm{B}Const`" This reverts commit `5d9b720189`. Obscure bugs resulting from this commit due to assumptions regarding zero-extension of higher bits.	2022-07-27 20:31:08 +01:00
Merry	fbdcfeab99	emit_x64_packed: Do not use XmmBConst here Broadcasting is inappropriate	2022-07-27 20:14:49 +01:00
Merry	1f51dceb60	Update for fmt 9.0.0	2022-07-26 11:20:47 +01:00
Merry	82d71b850e	a32_emit_x64: Bugfix for A32GetCpsr for non-FastBMI2 Incorrect loading of E and T flags	2022-07-26 10:44:30 +01:00
Merry	a2b3199adf	Convert NZCV to C flag where able	2022-07-23 11:46:07 +01:00
Merry	6bcc424e1a	emit_x64_vector: Ensure FPSR.QC is set even if output is invalidated	2022-07-20 19:44:39 +01:00
Merry	34cb465fc7	translate_thumb: IsThumb16: Mask not required	2022-07-20 17:34:31 +01:00
Merry	72c87d11e4	a32_get_set_elimination_pass: Correct insertion point	2022-07-20 16:53:48 +01:00
Merry	da2b1c5724	a32_get_set_elimination_pass: Convert NZ to NZC	2022-07-20 16:45:14 +01:00
Merry	6f106602ba	a32_get_set_elimination_pass: Add option to disable NZC -> NZ conversion	2022-07-20 16:42:39 +01:00
Merry	52aa68c31c	backend/x64: Fixup NZ flag emission	2022-07-20 14:58:28 +01:00
Merry	b97147e187	a32_get_set_elimination_pass: Reduce NZC to 00C	2022-07-20 14:44:33 +01:00
Merry	03dcc3fa50	a32_get_set_elimination_pass: Reduce NZC to NZ where possible	2022-07-20 14:08:41 +01:00
Merry	cf08130f2c	A32: Condense flag handling Remove individual flag handlers, and handle them in chuks where able, to produce more optimal code.	2022-07-19 22:05:13 +01:00
Merry	2e1ab36240	microinstruction: Also track MostSignificantBit and IsZero{32,64} as pseudoops	2022-07-19 22:02:56 +01:00
Merry	ac19912fe7	microinstruction: Optimize storage of associated pseudooperation	2022-07-19 22:02:18 +01:00
Merry	51a89dbb7a	A64CallbackConfigPass: Ensure IR instructions emitted by this pass have correct location descriptors attached	2022-07-17 22:42:56 +01:00
Merry	da5d06c32a	backend/x64: Remove unused member halt_requested from StackLayout	2022-07-15 15:19:01 +01:00
Merry	840982be95	block_of_code: Remove far code machinery	2022-07-14 08:58:00 +01:00
Merry	dd60f4b7d8	emit_x64_memory: Use deferred emits	2022-07-14 08:58:00 +01:00
Merry	0d1e4fc4a8	a32_emit_x64: Remove use of far code from EmitTerminalImpl LinkBlock	2022-07-14 08:58:00 +01:00
Merry	36f6114559	emit_x64_vector_floating_point: Use deferred emits	2022-07-14 08:58:00 +01:00
Merry	7d5e078baa	emit_x64_floating_point: MSVC fixup	2022-07-14 08:58:00 +01:00
Merry	11ba75b7f0	emit_x64_floating_point: Use deferred emits	2022-07-14 08:58:00 +01:00
Merry	6c38ed8a89	emit_x86: Introduce the concept of deferred emits Remove the concept of the far code region	2022-07-14 08:58:00 +01:00
Merry	b6ddeeea0f	Implement memory aborts	2022-07-13 12:38:03 +01:00
Merry	285e617e35	Revert "frontend: Add option to halt after memory accesses (#682 )" This reverts commit `5ad1d02351`.	2022-07-13 12:34:37 +01:00
Merry	7016ace72b	llvm_disassemble: Add hex output	2022-07-12 19:20:25 +01:00
Merry	cd85b7fdaa	emit_x64: Fix bugs in fast dispatcher * We failed to invalidate entries if there are no patches required for a location descriptor. * Bug in A64 hashing code (rbx instead of rbp). * Bug in A32 and A64 lookup code (inconsistent choice of key: PC vs IR::LocationDescriptor). * Test case added.	2022-07-11 16:06:54 +01:00
Wunkolo	a5318c775c	constant_pool: Use `std::span` to manage pool Simplifies some raw pointer arithmetic and type-usage into the new `ConstantT` type.	2022-07-07 23:46:21 +01:00
Wunkolo	5d9b720189	block_of_code: Refactor `MConst` to `Xmm{B}Const` `MConst` is refactored into `XmmConst` to clearly communicate the addressable space of the newly allocated 16-byte memory constant. `GetVectorOf` is elevated into a globally available `XmmBConst` function that "broadcasts" bits of the input-value into n-bit elements that span the width of the Xmm-constant. `emit_x64_floating_point` will utilize the same 16-byte broadcasted-constants to encourage more cache-hits within the constant-pool between vector and non-vector code.	2022-07-07 23:46:05 +01:00
Liam	02c8b434c7	interface: allow clear of previously-signaled halt	2022-07-07 23:45:09 +01:00
Wunkolo	4d78d167d6	emit_x64_{vector_}floating_point: Add AVX512 implementation for `ForceToDefaultNaN` `vfpclassp* k, xmm, i8` has better latency(4->3) and allocates better execution ports(01->5) that are out of the way of ALU-ports than `vcmpunordp* xmm, xmm, xmm`(`vcmpp* xmm, xmm, xmm, i8`) and removes the pipeline dependency on `xmm0` in favor AVX512 `k`-mask registers. `vblendmp* xmm, k, xmm, mem` is about the same throughput and latency as `blendvp* xmm. mem` but has the benefit of embedded broadcasts to reduce memory bandwidth(32/64-bit read rather than 128-bit) and lends itself to a future size optimization feature of `constant_pool`.	2022-06-22 00:08:49 +01:00
Wunkolo	6367a26e62	emit_x64_{vector_}floating_point: Add AVX512 implementation for `DenormalsAreZero` Both single and double precision floating point numbers as well as the packed and unpacked version of this instruction will be able to use the same memory constant. This takes advantage of the fact that `VFIXUPIMM*` doesn't just copy from the source, but it will convert to `0.0` if it turns out that it is a denormal and the `MXCSR.DAZ` flag is set. ``` tsrc[31:0]←((src1[30:23] = 0) AND (MXCSR.DAZ =1)) ? 0.0 : src1[31:0] ... CASE(token_response[3:0]) { ... 0001: dest[31:0]←tsrc[31:0]; ; pass through src1 normal input value, denormal as zero ... ```	2022-06-22 00:08:14 +01:00
Wunkolo	3ed2aebb20	backend/x64: Update `FpFixup` constants with denormal behavior There is an important subtlety that should be documented here. All the operands of `FpFixup` that read from the `Src` register actually do a `DAZ` operation if `MXCSR.DAZ` is set.	2022-06-22 00:08:14 +01:00
Merry	d40557b751	A32/A64: Allow std::nullopt from MemoryReadCode Raise a fault at runtime if this block is executed	2022-06-21 21:41:27 +01:00
liamwhite	5ad1d02351	frontend: Add option to halt after memory accesses (#682 ) Intended to be used for library users wishing implement accurate memory watchpoints. * A32: optionally make memory instructions the end of basic blocks * A64: optionally make memory instructions the end of basic blocks * Make memory halt checking a user configurable * oops	2022-06-16 18:09:04 +01:00
SachinVin	46989efc2b	asimd_one_reg_modified_immediate.cpp: Rename `mvn` to `mvn_`	2022-05-28 13:27:14 +01:00
Merry	e44ac5b84c	CMakeLists: Allow building on arm64	2022-05-28 13:27:14 +01:00
Merry	2779f24862	emit_x64_packed: Optimize GE flag generation for signed packed add/sub sum >= 0 is equivalent to sum > -1	2022-05-17 23:50:51 +01:00
Merry	b224fad171	emit_x64_vector_floating_point: Implement workaround for issue 678	2022-05-17 21:06:16 +01:00
Merry	b1dc11a32d	exception_handler_macos: Avoid use of deprecated function mach_port_destroy	2022-05-17 20:47:13 +01:00
Merry	e007d94133	backend/x64: Use templated lambda in each use of GenerateLookupTableFromList	2022-05-17 20:25:27 +01:00
Merry	57af72a567	CMakeLists: Make mcl a public link dependency	2022-04-19 20:33:26 +01:00
Liam	898f14b772	backend/x64: use mmap for all code allocations on Linux	2022-04-19 18:45:46 +01:00
Merry	78b4ba10c9	Migrate to mcl	2022-04-19 18:05:04 +01:00
Merry	de4154aa18	externals: Remove mp and replace uses with mcl	2022-04-19 16:28:28 +01:00
Wunkolo	27bbf4501b	backend/x64: Use upper EVEX registers as scratch space AVX512 adds an additional 16 simd registers, for a total of 32 simd registers, accessible by utilizing EVEX encoded instructions. Rather than using the `ScratchXmm` function, adding additional register-pressure and spilling, AVX512-enabled contexts can just directly use `xmm{16-31}` registers as intermediate scratch registers.	2022-04-06 17:41:55 +01:00
merry	644172477e	Implement enable_cycle_counting	2022-04-03 16:10:32 +01:00
merry	aac1f6ab1b	Implement halt_reason * Provide reason for halting and atomically update this. * Allow user to specify a halt reason and return this information on halt. * Check if halt was requested prior to starting execution.	2022-04-03 15:37:20 +01:00
merry	116297ccd5	common: Add atomic Implement atomic or operation on u32	2022-04-03 15:30:39 +01:00
merry	f6be6bc14b	emit_x64_memory: Appease MSVC Associated with changes in `8bcd46b7e9`	2022-04-02 20:41:34 +01:00
merry	8bcd46b7e9	emit_x64_memory: Ensure 128-bit loads/stores are atomic	2022-04-02 19:33:48 +01:00
merry	e27733464b	emit_x64_memory: Always order exclusive accesses	2022-04-02 19:33:15 +01:00
merry	cd91a36613	emit_x64_memory: Fix bug in 16-bit ordered EmitReadMemoryMov	2022-04-02 19:32:46 +01:00
merry	9cadab8fa9	backend/emit_x64_memory: Enforce memory ordering	2022-03-29 20:57:34 +01:00
merry	675efecf47	emit_x64_memory: Combine A32 and A64 memory code	2022-03-29 20:51:50 +01:00
merry	af2d50288f	A64/sys_ic: Return to dispatch on possible invalidation	2022-03-27 15:27:34 +01:00
merry	cf0709c7f1	emit_x64_memory: Share Emit{Read,Write}MemoryMove	2022-03-26 16:51:55 +00:00
merry	64adc91ca2	emit_x64_memory: Move EmitFastmemVAddr to common file	2022-03-26 16:49:14 +00:00
merry	18f02e2088	emit_x64_memory: Move EmitVAddrLookup to common file	2022-03-26 16:46:06 +00:00
merry	3d657c450a	emit_x64_memory: Share EmitDetectMisalignedVAddr	2022-03-26 16:09:56 +00:00
merry	fb586604b4	emit_x64_memory: Share constants	2022-03-26 16:05:03 +00:00
merry	5cf2d59913	A32: Add AccType information and propagate to IR-level	2022-03-26 15:38:10 +00:00
merry	614ecb7020	A64: Propagate AccType information to IR-level	2022-03-26 15:38:10 +00:00
merry	879f211686	ir/value: Add AccType to Value	2022-03-26 15:38:10 +00:00
Alexandre Bouvier	9d369436d8	cmake: Fix unicorn and llvm	2022-03-22 20:27:01 +00:00
merry	c78b82dd2c	vfp: VLDM is UNPREDICABLE when n is R15 in thumb mode	2022-03-20 20:52:11 +00:00
Sergi Granell	0ec4a23710	thumb32: Implement LDA and STL Note that those are ARMv8 additions to the Thumb instruction set.	2022-03-20 20:16:27 +00:00
merry	e1a266b929	A32: Implement SHA256SU1	2022-03-20 13:59:18 +00:00
merry	ab4c6cfefb	A32: Implement SHA256SU0	2022-03-20 13:59:18 +00:00
merry	c022a778d6	A32: Implement SHA256H, SHA256H2	2022-03-20 13:59:18 +00:00
merry	bb713194a0	backend/x64: Implement SHA256 polyfills	2022-03-20 13:59:18 +00:00
merry	98cff8dd0d	IR: Implement SHA256MessageSchedule{0,1}	2022-03-20 13:59:18 +00:00
merry	f0a4bf1f6a	IR: Implement SHA256Hash	2022-03-20 13:59:18 +00:00
merry	a4daad6336	block_of_code: Add HostFeature SHA	2022-03-20 00:13:03 +00:00
Merry	bcfe377aaa	x64/reg_alloc: More zero extension paranoia	2022-03-06 12:24:50 +00:00
Merry	316b95bb3f	{a32,a64}_emit_x64_memory: Zero extension paranoia	2022-03-06 12:10:40 +00:00
Merry	0fd32c5fa4	a64_emit_x64_memory: Fix bug in 128 bit exclusive write fallback	2022-02-28 19:53:43 +00:00
merry	5ea2b49ef0	backend/x64: Inline exclusive memory access operations (#664 ) * a64_emit_x64_memory: Add Unsafe_IgnoreGlobalMonitor optimization * a32_emit_x64_memory: Add Unsafe_IgnoreGlobalMonitor optimization * a32_emit_x64_memory: Remove dead code * {a32,a64}_emit_x64_memory: Also verify vaddr in Exclusive{Read,Write}MemoryInlineUnsafe * a64_emit_x64_memory: Full fallback for ExclusiveWriteMemoryInlineUnsafe * a64_emit_x64_memory: Inline full locking * a64_emit_x64_memory: Allow inlined locking to be optionally removed * spin_lock: Use xbyak instead of inline asm * a64_emit_x64_memory: Recompile on exclusive fastmem failure * Avoid variable shadowing * a32_emit_x64_memory: Implement recompilation * Fix recompilation * spin_lock: Clang format fix * fix fallback function calls	2022-02-28 08:13:10 +00:00
merry	0a11e79b55	backend/x64: Ensure all HostCalls are appropriately zero-extended	2022-02-27 20:04:44 +00:00
merry	6c4fa780e0	{a32,a64}_emit_x64_memory: Ensure return value of fastmem callback are zero-extended	2022-02-27 19:58:23 +00:00
merry	593de127d2	a64_emit_x64: Clear fastmem patch information on ClearCache	2022-02-27 19:50:05 +00:00
Merry	c90173151e	backend/x64: Split off memory emitters	2022-02-26 21:25:09 +00:00
Merry	19a423034e	block_of_code: Fix inaccurate size reporting in SpaceRemaining Typo: getCode should be getCurr: Instead of comparing against the current pointer, we were incorrectly comparing against the start of memory	2022-02-26 16:09:11 +00:00
Merry	ea08a389b4	emit_x64_floating_point: EmitFPToFixed: No need to round if rounding_mode == TowardsZero cvttsd2si truncates during operation	2022-02-23 20:44:02 +00:00
merry	b34214f953	emit_x64_floating_point: Improve EmitFPToFixed codegen	2022-02-23 19:42:15 +00:00
merry	5fe274f510	emit_x64_floating_point: Deinterlace 64-bit FPToFixed signed/unsigned codepaths	2022-02-23 19:14:41 +00:00
merry	b8dd1c7510	emit_x64_floating_point: Correct dead-code warning in MSVC 2019	2022-02-12 22:07:26 +00:00
merry	95a1ebfb97	backend/x64: Bugfix: A32 frontent also uses FPSCR.QC	2022-02-12 21:46:45 +00:00
Fernando Sahmkow	a8cbfd9af4	X86_Backend: set fences correctly for memory barriers and synchronization.	2022-02-01 14:27:54 +00:00
liushuyu	40afbe1927	disassembler_thumb: fix formatting issues with fmt 8.1.x ... ... fmt 8.1.0 added more formatting checks and Cond can't be formatted directly now	2022-01-05 21:49:51 -07:00
Wunkolo	ad5465d6ce	constant_pool: Use `tsl::robin_map` rather than `unordered_map` Finding a much more drastic improvement with `robin_map`. `map`: ``` [master] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 567.0 ms ± 6.9 ms [User: 513.1 ms, System: 53.2 ms] Range (min … max): 554.4 ms … 588.1 ms 100 runs ``` `unordered_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 561.1 ms ± 4.5 ms [User: 508.1 ms, System: 52.3 ms] Range (min … max): 552.6 ms … 574.2 ms 100 runs ``` `tsl::robin_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 553.5 ms ± 5.6 ms [User: 500.7 ms, System: 52.1 ms] Range (min … max): 545.7 ms … 569.3 ms 100 runs ```	2022-01-01 12:13:13 +00:00
Wunkolo	e57bb0569a	constant_pool: Convert hashtype from `tuple` to `pair`	2022-01-01 12:13:13 +00:00
Wunkolo	befc22a61e	constant_pool: Use `unordered_map` rather than `map` `map` is an ordinal structure with log(n) time searches. `unordered_map` uses O(1) average-time searches and O(n) in the worst case where a bucket has a to a colliding hash and has to start chaining. The unordered version should speed up our general-case when looking up constants. I've added a trivial order-dependent(_(0,1) and (1,0) will return a different hash_) hash to combine a 128-bit constant into a 64-bit hash that generally will not collide, using a bit-rotate to preserve entropy.	2022-01-01 12:13:13 +00:00
Morph	28714ee75a	general: Rename files with duplicate names In MSVC, having files with identical filenames will result into massive slowdowns when compiling. The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h	2021-12-23 11:38:58 +00:00
Andrea Pappacoda	4dcebc1822	build(cmake): add install target This makes dynarmic installable, and also adds a CMake package config file, that allows projects to use `find_package(dynarmic)` to import the library. I know #636 adds the same thing, but while experimenting with the different install options in https://github.com/merryhime/dynarmic/pull/636#discussion_r725656034 I ended up with a working patch, so I'm proposing this as well. This implements solution 2.	2021-10-30 19:03:23 +01:00
Andrea Pappacoda	b87a889d98	build(cmake): add version and soversion to the library This adds versioning information to the built library. When building the shared library on Linux systems, a new object will be created: libdynarmic.so.5 This is really useful when talking about ABI compatibility. The variables dynarmic_VERSION and dynarmic_VERSION_MAJOR are implicitly created when calling project(dynarmic VERSION x.y.z)	2021-10-11 06:53:05 +01:00
Fernando S	e4146ec3a1	x64 Interface: Allow for asynchronous invalidation (#647 ) * x64 Interface: Make Invalidation asynchronous. * Apply suggestions from code review	2021-10-05 15:06:41 +01:00
Wunkolo	5e7d2afe0f	IR: Introduce `VectorReduceAdd{8,16,32,64}` opcode Adds all elements of vector and puts the result into the lowest element. Accelerates the `addv` instruction into a vectorized implementation rather than a serial one.	2021-09-27 19:54:11 +01:00
Marshall Mohror	0b8fd755d8	Fix `signal_stack_size` for glibc 2.34 `SIGSTKSZ` is now defined as `sysconf(_SC_SIGSTKSZ)` which is not constexpr, and returns a long which throws off the `std::max` template deduction.	2021-09-22 20:38:11 +01:00
Ben	6ce8bfaf32	Add API function to retrieve dissassembly as vector of strings (#644 ) Co-authored-by: ben <Avuxo@users.noreply.github.com>	2021-09-16 16:45:20 -04:00
Merry	517e35f845	decoder_detail: Avoid MSVC ICE MSVC has an internal compiler error when assume is present in this constexpr function	2021-08-15 19:32:05 +01:00
Merry	2e4f99ae3d	CMakeLists: Expose DYNARMIC_IGNORE_ASSERTS option	2021-08-15 16:09:37 +01:00
Merry	4988d9fab3	disassembler_arm: Fix format strings for vfp_VMOV_from_i{8,16}	2021-08-15 15:16:53 +01:00
Merry	615ce8c7c5	IR: Remove A32 IR instructions Get{N,Z,V}Flag	2021-08-12 13:06:15 +01:00
Wunkolo	1e94acff66	ir: Add VectorBroadcastElement{Lower} IR instruction The lane-splatting variant of `FMUL` and `FMLA` is very common in instruction streams when implementing things like matrix multiplication. When used, they are used very densely. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication The way this is currently implemented is by grabbing the particular lane into a general purpose register and then broadcasting it into a simd register through `VectorGetElement` and `VectorBroadcast`. ```cpp const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index)); ``` What could be done instead is to keep it within the vector-register and use a permute/shuffle to "splat" the particular lane across all other lanes, removing the GPR-round-trip. This is implemented as the new IR instruction `VectorBroadcastElement`: ```cpp const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index); ```	2021-08-07 23:03:57 +01:00
Wunkolo	46b8cfabc0	bit_util: Protect Replicate from automatic up-casting Recursive calls to `Replicate` beyond the first call might cause an unintentional up-casting to an `int` type due to `\|` and `<<` operations on types such as `uint8_t` and `uint16_t` This makes sure calls such as `Recursive<u8>` stay as the `u8` type through-out.	2021-08-07 23:03:57 +01:00
Merry	d41bc492fe	{a32,a64}_jitstate: Remove unnecessary headers	2021-08-07 19:35:33 +01:00
Merry	07b5734fb0	xbyak: Correct xbyak include directory xbyak is intended to be installed in /usr/local/include/xbyak. Since we desire not to install xbyak before using it, we copy the headers to the appropriate directory structure and use that instead	2021-08-07 15:13:49 +01:00
Merry	59fb568b27	tests: Use Zydis for disassembly	2021-08-06 15:29:43 +01:00
Wunkolo	f33bd69ec2	emit_x64_vector_floating_point: AVX512 implementation of EmitFPVectorToFixed AVX512 introduces the _unsigned_ variant of float-to-integer conversion functions via `vcvttp{sd}2u{dq}q`. In the case that a value is not representable as an unsigned integer, it will result in `0xFFFFF...` which can be utilized to get "free" saturation when the floating point value exceeds the unsigned range, after masking away negative values. https://www.felixcloutier.com/x86/vcvttps2udq https://www.felixcloutier.com/x86/vcvttpd2uqq This PR also speeds up the _signed_ conversion function for fp64->int64 https://www.felixcloutier.com/x86/vcvttpd2qq	2021-07-17 22:13:11 +01:00
SachinVin	048da372e9	block_of_code.cpp: remove redundant `align()`	2021-07-17 22:12:31 +01:00

1 2 3 4 5 ...

2603 Commits