pkgsrc-WIP-changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

py-llvmlite: add upgrade candidate



Module Name:	pkgsrc-wip
Committed By:	Thomas Klausner <wiz%NetBSD.org@localhost>
Pushed By:	wiz
Date:		Tue Aug 27 01:02:49 2024 +0200
Changeset:	76831c7012601eb8fbff6860dde381e3af3a0dab

Added Files:
	py-llvmlite/DESCR
	py-llvmlite/Makefile
	py-llvmlite/PLIST
	py-llvmlite/TODO
	py-llvmlite/distinfo
	py-llvmlite/files/llvm14-clear-gotoffsetmap.patch
	py-llvmlite/files/llvm14-remove-use-of-clonefile.patch
	py-llvmlite/files/llvm14-svml.patch
	py-llvmlite/log
	py-llvmlite/patches/patch-ffi_build.py

Log Message:
py-llvmlite: add upgrade candidate

Needs build fixing and porting to llvm 15.

To see a diff of this commit:
https://wip.pkgsrc.org/cgi-bin/gitweb.cgi?p=pkgsrc-wip.git;a=commitdiff;h=76831c7012601eb8fbff6860dde381e3af3a0dab

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

diffstat:
 py-llvmlite/DESCR                                  |   15 +
 py-llvmlite/Makefile                               |  105 +
 py-llvmlite/PLIST                                  |  117 ++
 py-llvmlite/TODO                                   |  131 ++
 py-llvmlite/distinfo                               |   15 +
 py-llvmlite/files/llvm14-clear-gotoffsetmap.patch  |   31 +
 .../files/llvm14-remove-use-of-clonefile.patch     |   54 +
 py-llvmlite/files/llvm14-svml.patch                | 2194 ++++++++++++++++++++
 py-llvmlite/log                                    |   44 +
 py-llvmlite/patches/patch-ffi_build.py             |   16 +
 10 files changed, 2722 insertions(+)

diffs:
diff --git a/py-llvmlite/DESCR b/py-llvmlite/DESCR
new file mode 100644
index 0000000000..3ac7fcdc36
--- /dev/null
+++ b/py-llvmlite/DESCR
@@ -0,0 +1,15 @@
+llvmlite provides a Python binding to LLVM for use in Numba.
+
+The old llvmpy binding exposes a lot of LLVM APIs but the mapping of
+C++-style memory management to Python is error prone. Numba and many
+JIT compilers do not need a full LLVM API. Only the IR builder,
+optimizer, and JIT compiler APIs are necessary.
+
+llvmlite is a project originally tailored for Numba's needs, using the
+following approach:
+
+* A small C wrapper around the parts of the LLVM C++ API we need that
+  are not already exposed by the LLVM C API.
+* A ctypes Python wrapper around the C API.
+* A pure Python implementation of the subset of the LLVM IR builder
+  that we need for Numba.
diff --git a/py-llvmlite/Makefile b/py-llvmlite/Makefile
new file mode 100644
index 0000000000..5e4bd2307b
--- /dev/null
+++ b/py-llvmlite/Makefile
@@ -0,0 +1,105 @@
+# $NetBSD: Makefile,v 1.26 2024/08/25 06:18:37 wiz Exp $
+
+DISTNAME=	llvmlite-0.43.0
+PKGNAME=	${PYPKGPREFIX}-${DISTNAME}
+CATEGORIES=	devel python
+MASTER_SITES=	${MASTER_SITE_PYPI:=l/llvmlite/}
+
+MAINTAINER=	pkgsrc-users%NetBSD.org@localhost
+HOMEPAGE=	https://llvmlite.readthedocs.io/
+COMMENT=	Lightweight LLVM Python binding for writing JIT compilers
+LICENSE=	2-clause-bsd
+
+# Statically linking in a purpose-built LLVM as upstream urges to do.
+# They support only a certain version of LLVM per release, and that
+# with patches.
+# TODO: As of 0.43.0, this supports llvm 15.
+LLVM_VERSION=	14.0.6
+DISTFILES=	${DEFAULT_DISTFILES}
+DISTFILES+=	llvm-${LLVM_VERSION}.src.tar.xz
+DISTFILES+=	lld-${LLVM_VERSION}.src.tar.xz
+DISTFILES+=	libunwind-${LLVM_VERSION}.src.tar.xz
+
+LLVM_SITE=					https://github.com/llvm/llvm-project/releases/download/llvmorg-${LLVM_VERSION}/
+SITES.llvm-${LLVM_VERSION}.src.tar.xz=		${LLVM_SITE}
+SITES.lld-${LLVM_VERSION}.src.tar.xz=		${LLVM_SITE}
+SITES.libunwind-${LLVM_VERSION}.src.tar.xz=	${LLVM_SITE}
+
+USE_LANGUAGES=		c c++
+USE_CXX_FEATURES=	c++14
+# Just for LLVM build.
+USE_TOOLS=		cmake
+
+# See
+# https://github.com/numba/llvmlite/blob/main/conda-recipes/llvmdev/build.sh
+# for the procedure. This is what
+# https://llvmlite.readthedocs.io/en/latest/admin-guide/install.html
+# points to. Need to match up this to the correct llvmlite release, as
+# they do not include this in the tarball. Python people think building
+# stuff from source is hard and keep it so:-/
+# I kept some upstream comments inline.
+
+LLVM_CMAKE_CONFIGURE_ARGS=	-DCMAKE_INSTALL_PREFIX=${WRKDIR}/llvm-inst
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DCMAKE_BUILD_TYPE:STRING=Release
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_ENABLE_PROJECTS:STRING=lld
+# We explicitly want static linking.
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DBUILD_SHARED_LIBS:BOOL=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_ENABLE_ASSERTIONS:BOOL=ON
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLINK_POLLY_INTO_TOOLS:BOOL=ON
+# Don't really require libxml2. Turn it off explicitly to avoid accidentally linking to system libs
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_ENABLE_LIBXML2:BOOL=OFF
+# Urgh, llvm *really* wants to link to ncurses / terminfo and we *really* do not want it to.
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DHAVE_TERMINFO_CURSES=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_ENABLE_TERMINFO=OFF
+# Sometimes these are reported as unused. Whatever.
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DHAVE_TERMINFO_NCURSES=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DHAVE_TERMINFO_NCURSESW=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DHAVE_TERMINFO_TERMINFO=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DHAVE_TERMINFO_TINFO=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DHAVE_TERMIOS_H=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DCLANG_ENABLE_LIBXML=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLIBOMP_INSTALL_ALIASES=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_ENABLE_RTTI=OFF
+# Not sure if this should be adapted for pkgsrc.
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_TARGETS_TO_BUILD=all
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly
+# for llvm-lit
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_INCLUDE_UTILS=ON
+# doesn't build without the rest of LLVM project
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_INCLUDE_BENCHMARKS:BOOL=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_INCLUDE_DOCS=OFF
+LLVM_CMAKE_CONFIGURE_ARGS+=	-DLLVM_INCLUDE_EXAMPLES=OFF
+
+
+MAKE_ENV+=		LLVM_CONFIG=${WRKDIR}/llvm-inst/bin/llvm-config
+# unable to pass LLVM bit-code files to linker
+MAKE_ENV.NetBSD+=	CXX_FLTO_FLAGS=
+MAKE_ENV.NetBSD+=	LD_FLTO_FLAGS=
+
+# The llvm build detects lots of stuff outside the build sandbox ...
+# a python it likes, git ... just hoping that this does not matter
+# much for the static lib being used by llvmlite.
+
+pre-configure:
+	cd ${WRKDIR}/llvm-${LLVM_VERSION}.src && \
+	  for f in ${FILESDIR}/llvm*.patch; do patch -Np2 < $$f; done
+	${LN} -s llvm-${LLVM_VERSION}.src ${WRKDIR}/llvm
+	${LN} -s lld-${LLVM_VERSION}.src ${WRKDIR}/lld
+	${LN} -s libunwind-${LLVM_VERSION}.src ${WRKDIR}/libunwind
+	cd ${WRKDIR} &&  mkdir build && cd build && \
+	  cmake -G'Unix Makefiles' ${LLVM_CMAKE_CONFIGURE_ARGS} ../llvm && \
+	  ${MAKE} -j${MAKE_JOBS} && \
+	  ${MAKE} -j${MAKE_JOBS} check-llvm-unit && \
+	  ${MAKE} install
+	${SED} -e 's/ -stdlib=libc++//' ${WRKSRC}/ffi/Makefile.freebsd > ${WRKSRC}/ffi/Makefile.netbsd
+
+.include "../../mk/bsd.prefs.mk"
+post-install:
+.if ${OPSYS} == "Darwin"
+	install_name_tool -id \
+		${PREFIX}/${PYSITELIB}/llvmlite/binding/libllvmlite.dylib \
+		${DESTDIR}${PREFIX}/${PYSITELIB}/llvmlite/binding/libllvmlite.dylib
+.endif
+
+.include "../../lang/python/egg.mk"
+.include "../../mk/bsd.pkg.mk"
diff --git a/py-llvmlite/PLIST b/py-llvmlite/PLIST
new file mode 100644
index 0000000000..984aa1a8a6
--- /dev/null
+++ b/py-llvmlite/PLIST
@@ -0,0 +1,117 @@
+@comment $NetBSD: PLIST,v 1.7 2024/01/24 15:25:12 thor Exp $
+${PYSITELIB}/${EGG_INFODIR}/PKG-INFO
+${PYSITELIB}/${EGG_INFODIR}/SOURCES.txt
+${PYSITELIB}/${EGG_INFODIR}/dependency_links.txt
+${PYSITELIB}/${EGG_INFODIR}/top_level.txt
+${PYSITELIB}/llvmlite/__init__.py
+${PYSITELIB}/llvmlite/__init__.pyc
+${PYSITELIB}/llvmlite/__init__.pyo
+${PYSITELIB}/llvmlite/_version.py
+${PYSITELIB}/llvmlite/_version.pyc
+${PYSITELIB}/llvmlite/_version.pyo
+${PYSITELIB}/llvmlite/binding/__init__.py
+${PYSITELIB}/llvmlite/binding/__init__.pyc
+${PYSITELIB}/llvmlite/binding/__init__.pyo
+${PYSITELIB}/llvmlite/binding/analysis.py
+${PYSITELIB}/llvmlite/binding/analysis.pyc
+${PYSITELIB}/llvmlite/binding/analysis.pyo
+${PYSITELIB}/llvmlite/binding/common.py
+${PYSITELIB}/llvmlite/binding/common.pyc
+${PYSITELIB}/llvmlite/binding/common.pyo
+${PYSITELIB}/llvmlite/binding/context.py
+${PYSITELIB}/llvmlite/binding/context.pyc
+${PYSITELIB}/llvmlite/binding/context.pyo
+${PYSITELIB}/llvmlite/binding/dylib.py
+${PYSITELIB}/llvmlite/binding/dylib.pyc
+${PYSITELIB}/llvmlite/binding/dylib.pyo
+${PYSITELIB}/llvmlite/binding/executionengine.py
+${PYSITELIB}/llvmlite/binding/executionengine.pyc
+${PYSITELIB}/llvmlite/binding/executionengine.pyo
+${PYSITELIB}/llvmlite/binding/ffi.py
+${PYSITELIB}/llvmlite/binding/ffi.pyc
+${PYSITELIB}/llvmlite/binding/ffi.pyo
+${PYSITELIB}/llvmlite/binding/initfini.py
+${PYSITELIB}/llvmlite/binding/initfini.pyc
+${PYSITELIB}/llvmlite/binding/initfini.pyo
+${PYSITELIB}/llvmlite/binding/libllvmlite.so
+${PYSITELIB}/llvmlite/binding/linker.py
+${PYSITELIB}/llvmlite/binding/linker.pyc
+${PYSITELIB}/llvmlite/binding/linker.pyo
+${PYSITELIB}/llvmlite/binding/module.py
+${PYSITELIB}/llvmlite/binding/module.pyc
+${PYSITELIB}/llvmlite/binding/module.pyo
+${PYSITELIB}/llvmlite/binding/object_file.py
+${PYSITELIB}/llvmlite/binding/object_file.pyc
+${PYSITELIB}/llvmlite/binding/object_file.pyo
+${PYSITELIB}/llvmlite/binding/options.py
+${PYSITELIB}/llvmlite/binding/options.pyc
+${PYSITELIB}/llvmlite/binding/options.pyo
+${PYSITELIB}/llvmlite/binding/orcjit.py
+${PYSITELIB}/llvmlite/binding/orcjit.pyc
+${PYSITELIB}/llvmlite/binding/orcjit.pyo
+${PYSITELIB}/llvmlite/binding/passmanagers.py
+${PYSITELIB}/llvmlite/binding/passmanagers.pyc
+${PYSITELIB}/llvmlite/binding/passmanagers.pyo
+${PYSITELIB}/llvmlite/binding/targets.py
+${PYSITELIB}/llvmlite/binding/targets.pyc
+${PYSITELIB}/llvmlite/binding/targets.pyo
+${PYSITELIB}/llvmlite/binding/transforms.py
+${PYSITELIB}/llvmlite/binding/transforms.pyc
+${PYSITELIB}/llvmlite/binding/transforms.pyo
+${PYSITELIB}/llvmlite/binding/value.py
+${PYSITELIB}/llvmlite/binding/value.pyc
+${PYSITELIB}/llvmlite/binding/value.pyo
+${PYSITELIB}/llvmlite/ir/__init__.py
+${PYSITELIB}/llvmlite/ir/__init__.pyc
+${PYSITELIB}/llvmlite/ir/__init__.pyo
+${PYSITELIB}/llvmlite/ir/_utils.py
+${PYSITELIB}/llvmlite/ir/_utils.pyc
+${PYSITELIB}/llvmlite/ir/_utils.pyo
+${PYSITELIB}/llvmlite/ir/builder.py
+${PYSITELIB}/llvmlite/ir/builder.pyc
+${PYSITELIB}/llvmlite/ir/builder.pyo
+${PYSITELIB}/llvmlite/ir/context.py
+${PYSITELIB}/llvmlite/ir/context.pyc
+${PYSITELIB}/llvmlite/ir/context.pyo
+${PYSITELIB}/llvmlite/ir/instructions.py
+${PYSITELIB}/llvmlite/ir/instructions.pyc
+${PYSITELIB}/llvmlite/ir/instructions.pyo
+${PYSITELIB}/llvmlite/ir/module.py
+${PYSITELIB}/llvmlite/ir/module.pyc
+${PYSITELIB}/llvmlite/ir/module.pyo
+${PYSITELIB}/llvmlite/ir/transforms.py
+${PYSITELIB}/llvmlite/ir/transforms.pyc
+${PYSITELIB}/llvmlite/ir/transforms.pyo
+${PYSITELIB}/llvmlite/ir/types.py
+${PYSITELIB}/llvmlite/ir/types.pyc
+${PYSITELIB}/llvmlite/ir/types.pyo
+${PYSITELIB}/llvmlite/ir/values.py
+${PYSITELIB}/llvmlite/ir/values.pyc
+${PYSITELIB}/llvmlite/ir/values.pyo
+${PYSITELIB}/llvmlite/tests/__init__.py
+${PYSITELIB}/llvmlite/tests/__init__.pyc
+${PYSITELIB}/llvmlite/tests/__init__.pyo
+${PYSITELIB}/llvmlite/tests/__main__.py
+${PYSITELIB}/llvmlite/tests/__main__.pyc
+${PYSITELIB}/llvmlite/tests/__main__.pyo
+${PYSITELIB}/llvmlite/tests/customize.py
+${PYSITELIB}/llvmlite/tests/customize.pyc
+${PYSITELIB}/llvmlite/tests/customize.pyo
+${PYSITELIB}/llvmlite/tests/refprune_proto.py
+${PYSITELIB}/llvmlite/tests/refprune_proto.pyc
+${PYSITELIB}/llvmlite/tests/refprune_proto.pyo
+${PYSITELIB}/llvmlite/tests/test_binding.py
+${PYSITELIB}/llvmlite/tests/test_binding.pyc
+${PYSITELIB}/llvmlite/tests/test_binding.pyo
+${PYSITELIB}/llvmlite/tests/test_ir.py
+${PYSITELIB}/llvmlite/tests/test_ir.pyc
+${PYSITELIB}/llvmlite/tests/test_ir.pyo
+${PYSITELIB}/llvmlite/tests/test_refprune.py
+${PYSITELIB}/llvmlite/tests/test_refprune.pyc
+${PYSITELIB}/llvmlite/tests/test_refprune.pyo
+${PYSITELIB}/llvmlite/tests/test_valuerepr.py
+${PYSITELIB}/llvmlite/tests/test_valuerepr.pyc
+${PYSITELIB}/llvmlite/tests/test_valuerepr.pyo
+${PYSITELIB}/llvmlite/utils.py
+${PYSITELIB}/llvmlite/utils.pyc
+${PYSITELIB}/llvmlite/utils.pyo
diff --git a/py-llvmlite/TODO b/py-llvmlite/TODO
new file mode 100644
index 0000000000..851353f965
--- /dev/null
+++ b/py-llvmlite/TODO
@@ -0,0 +1,131 @@
+Upgrade candidate.
+
+1. adapt to use LLVM 15
+
+2. fix build:
+
+[100%] Built target MIRTests
+[100%] Built target LLVMExegesisTests
+[100%] Built target GlobalISelTests
+[100%] Linking CXX executable AMDGPUTests
+[100%] Built target AMDGPUTests
+[100%] Linking CXX executable CodeGenTests
+[100%] Linking CXX executable DebugInfoDWARFTests
+[100%] Built target CodeGenTests
+[100%] Built target DebugInfoDWARFTests
+[100%] Built target UnitTests
+[100%] Running lit suite /scratch/devel/py-llvmlite/work/llvm/test/Unit
+/scratch/devel/py-llvmlite/work/llvm-14.0.6.src/utils/lit/lit/formats/googletest.py:57: SyntaxWarning: invalid escape sequence '\('
+  upstream_prefix = re.compile('Running main\(\) from .*gtest_main\.cc')
+/scratch/devel/py-llvmlite/work/llvm-14.0.6.src/utils/lit/lit/TestRunner.py:181: SyntaxWarning: invalid escape sequence '\c'
+  """
+llvm-lit: /scratch/devel/py-llvmlite/work/llvm-14.0.6.src/utils/lit/lit/run.py:126: note: Raised process limit from 1024 to 1044
+-- Testing: 6458 tests, 32 workers --
+Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.
+FAIL: LLVM-Unit :: Support/./SupportTests/FileSystemTest.RealPath (5034 of 6458)
+******************** TEST 'LLVM-Unit :: Support/./SupportTests/FileSystemTest.RealPath' FAILED ********************
+Script:
+--
+/scratch/devel/py-llvmlite/work/build/unittests/Support/./SupportTests --gtest_filter=FileSystemTest.RealPath
+--
+Note: Google Test filter = FileSystemTest.RealPath
+[==========] Running 1 test from 1 test suite.
+[----------] Global test environment set-up.
+[----------] 1 test from FileSystemTest
+[ RUN      ] FileSystemTest.RealPath
+/scratch/devel/py-llvmlite/work/llvm/unittests/Support/Path.cpp:767: Failure
+fs::real_path(HomeDir, Expected): did not return errc::success.
+error number: 2
+error message: No such file or directory
+
+/scratch/devel/py-llvmlite/work/llvm/unittests/Support/Path.cpp:681: Failure
+fs::remove(TestDirectory.str()): did not return errc::success.
+error number: 66
+error message: Directory not empty
+
+[  FAILED  ] FileSystemTest.RealPath (0 ms)
+[----------] 1 test from FileSystemTest (0 ms total)
+
+[----------] Global test environment tear-down
+[==========] 1 test from 1 test suite ran. (0 ms total)
+[  PASSED  ] 0 tests.
+[  FAILED  ] 1 test, listed below:
+[  FAILED  ] FileSystemTest.RealPath
+
+ 1 FAILED TEST
+Test Directory: /tmp/lit-tmp-r1t29qpx/file-system-test-cdc719
+
+********************
+Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70..
+FAIL: LLVM-Unit :: Support/./SupportTests/FileSystemTest.permissions (5055 of 6458)
+******************** TEST 'LLVM-Unit :: Support/./SupportTests/FileSystemTest.permissions' FAILED ********************
+Script:
+--
+/scratch/devel/py-llvmlite/work/build/unittests/Support/./SupportTests --gtest_filter=FileSystemTest.permissions
+--
+Note: Google Test filter = FileSystemTest.permissions
+[==========] Running 1 test from 1 test suite.
+[----------] Global test environment set-up.
+[----------] 1 test from FileSystemTest
+[ RUN      ] FileSystemTest.permissions
+/scratch/devel/py-llvmlite/work/llvm/unittests/Support/Path.cpp:2271: Failure
+Expected equality of these values:
+  fs::setPermissions(TempPath, fs::set_gid_on_exe)
+    Which is: generic:1
+  NoError
+    Which is: system:0
+/scratch/devel/py-llvmlite/work/llvm/unittests/Support/Path.cpp:2272: Failure
+Value of: CheckPermissions(fs::set_gid_on_exe)
+  Actual: false
+Expected: true
+/scratch/devel/py-llvmlite/work/llvm/unittests/Support/Path.cpp:2301: Failure
+Expected equality of these values:
+  fs::setPermissions(TempPath, fs::all_perms & ~fs::sticky_bit)
+    Which is: generic:1
+  NoError
+    Which is: system:0
+/scratch/devel/py-llvmlite/work/llvm/unittests/Support/Path.cpp:2303: Failure
+Value of: CheckPermissions(fs::all_perms & ~fs::sticky_bit)
+  Actual: false
+Expected: true
+[  FAILED  ] FileSystemTest.permissions (0 ms)
+[----------] 1 test from FileSystemTest (0 ms total)
+
+[----------] Global test environment tear-down
+[==========] 1 test from 1 test suite ran. (0 ms total)
+[  PASSED  ] 0 tests.
+[  FAILED  ] 1 test, listed below:
+[  FAILED  ] FileSystemTest.permissions
+
+ 1 FAILED TEST
+Test Directory: /tmp/lit-tmp-r1t29qpx/file-system-test-695917
+
+********************
+Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
+********************
+Failed Tests (2):
+  LLVM-Unit :: Support/./SupportTests/FileSystemTest.RealPath
+  LLVM-Unit :: Support/./SupportTests/FileSystemTest.permissions
+
+
+Testing Time: 6.90s
+  Passed: 6456
+  Failed:    2
+--- test/CMakeFiles/check-llvm-unit ---
+*** [test/CMakeFiles/check-llvm-unit] Error code 1
+
+make[5]: stopped making "test/CMakeFiles/check-llvm-unit.dir/build" in /scratch/devel/py-llvmlite/work/build
+make[5]: 1 error
+
+make[5]: stopped making "test/CMakeFiles/check-llvm-unit.dir/build" in /scratch/devel/py-llvmlite/work/build
+--- test/CMakeFiles/check-llvm-unit.dir/all ---
+*** [test/CMakeFiles/check-llvm-unit.dir/all] Error code 2
+
+make[4]: stopped making "test/CMakeFiles/check-llvm-unit.dir/all" in /scratch/devel/py-llvmlite/work/build
+make[4]: 1 error
+
+make[4]: stopped making "test/CMakeFiles/check-llvm-unit.dir/all" in /scratch/devel/py-llvmlite/work/build
+--- test/CMakeFiles/check-llvm-unit.dir/rule ---
+*** [test/CMakeFiles/check-llvm-unit.dir/rule] Error code 2
+
+make[3]: stopped making "check-llvm-unit" in /scratch/devel/py-llvmlite/work/build
diff --git a/py-llvmlite/distinfo b/py-llvmlite/distinfo
new file mode 100644
index 0000000000..77574e2050
--- /dev/null
+++ b/py-llvmlite/distinfo
@@ -0,0 +1,15 @@
+$NetBSD: distinfo,v 1.22 2024/01/24 15:25:12 thor Exp $
+
+BLAKE2s (libunwind-14.0.6.src.tar.xz) = 21da632762db6524a46c1f721908b233265afe83728c1de5dd7757c662db0d99
+SHA512 (libunwind-14.0.6.src.tar.xz) = c8f3804c47ac33273238899e5682f9cb52465dcceff0e0ecf9925469620c6c9a62cc2c708a35a0e156b666e1198df52c5fff1da9d5ee3194605dfd62c296b058
+Size (libunwind-14.0.6.src.tar.xz) = 108680 bytes
+BLAKE2s (lld-14.0.6.src.tar.xz) = 2fc265b616bbdbaeecc8385fda204dbc28b1d871d98f4b3b3cd5183c4d6eefc8
+SHA512 (lld-14.0.6.src.tar.xz) = fad97b441f9642b73edd240af2c026259de0951d5ace42779e9e0fcf5e417252a1d744e2fc51e754a45016621ba0c70088177f88695af1c6ce290dd26873b094
+Size (lld-14.0.6.src.tar.xz) = 1366180 bytes
+BLAKE2s (llvm-14.0.6.src.tar.xz) = 2d44946453add45426569fd4187654f83881341c5c0109e4ffacc60e8f73af60
+SHA512 (llvm-14.0.6.src.tar.xz) = 6461bdde27aac17fa44c3e99a85ec47ffb181d0d4e5c3ef1c4286a59583e3b0c51af3c8081a300f45b99524340773a3011380059e3b3a571c3b0a8733e96fc1d
+Size (llvm-14.0.6.src.tar.xz) = 49660136 bytes
+BLAKE2s (llvmlite-0.43.0.tar.gz) = 379d69e08053b7c6d604a31a396496e9c05ed12899f07a671117b0d7add292e7
+SHA512 (llvmlite-0.43.0.tar.gz) = 82fc43e2b4be22ca5de5fe5ea850c4d363fe6ff7dd8f34e523bebb5b9ff5bb41a591f112b1732fab3cf60c6248aa157ed58962c58d91eedf01857fa3b4877c5b
+Size (llvmlite-0.43.0.tar.gz) = 157069 bytes
+SHA1 (patch-ffi_build.py) = 74ca5c04fa8da08768e883eb58c15856aed67f2a
diff --git a/py-llvmlite/files/llvm14-clear-gotoffsetmap.patch b/py-llvmlite/files/llvm14-clear-gotoffsetmap.patch
new file mode 100644
index 0000000000..239f4ab20c
--- /dev/null
+++ b/py-llvmlite/files/llvm14-clear-gotoffsetmap.patch
@@ -0,0 +1,31 @@
+From 322c79fff224389b4df9f24ac22965867007c2fa Mon Sep 17 00:00:00 2001
+From: Graham Markall <gmarkall%nvidia.com@localhost>
+Date: Mon, 13 Mar 2023 21:35:11 +0000
+Subject: [PATCH] RuntimeDyldELF: Clear the GOTOffsetMap when finalizing the
+ load
+
+This needs resetting so that stale entries are not left behind when the
+GOT section and index are reset.
+
+See llvm/llvm#61402: RuntimeDyldELF doesn't clear GOTOffsetMap in
+finalizeLoad(), leading to invalid GOT relocations on AArch64 -
+https://github.com/llvm/llvm-project/issues/61402.
+---
+ llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp b/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
+index f92618afdff6..eb3c27a9406a 100644
+--- a/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
++++ b/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
+@@ -2345,6 +2345,7 @@ Error RuntimeDyldELF::finalizeLoad(const ObjectFile &Obj,
+     }
+   }
+ 
++  GOTOffsetMap.clear();
+   GOTSectionID = 0;
+   CurrentGOTIndex = 0;
+ 
+-- 
+2.34.1
+
diff --git a/py-llvmlite/files/llvm14-remove-use-of-clonefile.patch b/py-llvmlite/files/llvm14-remove-use-of-clonefile.patch
new file mode 100644
index 0000000000..6ef9c9d61b
--- /dev/null
+++ b/py-llvmlite/files/llvm14-remove-use-of-clonefile.patch
@@ -0,0 +1,54 @@
+diff -ur a/llvm-14.0.6.src/lib/Support/Unix/Path.inc b/llvm-14.0.6.src/lib/Support/Unix/Path.inc
+--- a/llvm-14.0.6.src/lib/Support/Unix/Path.inc	2022-03-14 05:44:55.000000000 -0400
++++ b/llvm-14.0.6.src/lib/Support/Unix/Path.inc	2022-09-19 11:30:59.000000000 -0400
+@@ -1462,6 +1462,7 @@
+ std::error_code copy_file(const Twine &From, const Twine &To) {
+   std::string FromS = From.str();
+   std::string ToS = To.str();
++  /*
+ #if __has_builtin(__builtin_available)
+   if (__builtin_available(macos 10.12, *)) {
+     // Optimistically try to use clonefile() and handle errors, rather than
+@@ -1490,6 +1491,7 @@
+     // cheaper.
+   }
+ #endif
++  */
+   if (!copyfile(FromS.c_str(), ToS.c_str(), /*State=*/NULL, COPYFILE_DATA))
+     return std::error_code();
+   return std::error_code(errno, std::generic_category());
+diff -ur a/llvm-14.0.6.src/unittests/Support/Path.cpp b/llvm-14.0.6.src/unittests/Support/Path.cpp
+--- a/llvm-14.0.6.src/unittests/Support/Path.cpp	2022-03-14 05:44:55.000000000 -0400
++++ b/llvm-14.0.6.src/unittests/Support/Path.cpp	2022-09-19 11:33:07.000000000 -0400
+@@ -2267,15 +2267,15 @@
+ 
+   EXPECT_EQ(fs::setPermissions(TempPath, fs::set_uid_on_exe), NoError);
+   EXPECT_TRUE(CheckPermissions(fs::set_uid_on_exe));
+-
++#if !defined(__APPLE__)
+   EXPECT_EQ(fs::setPermissions(TempPath, fs::set_gid_on_exe), NoError);
+   EXPECT_TRUE(CheckPermissions(fs::set_gid_on_exe));
+-
++#endif
+   // Modern BSDs require root to set the sticky bit on files.
+   // AIX and Solaris without root will mask off (i.e., lose) the sticky bit
+   // on files.
+ #if !defined(__FreeBSD__) && !defined(__NetBSD__) && !defined(__OpenBSD__) &&  \
+-    !defined(_AIX) && !(defined(__sun__) && defined(__svr4__))
++    !defined(_AIX) && !(defined(__sun__) && defined(__svr4__)) && !defined(__APPLE__)
+   EXPECT_EQ(fs::setPermissions(TempPath, fs::sticky_bit), NoError);
+   EXPECT_TRUE(CheckPermissions(fs::sticky_bit));
+ 
+@@ -2297,10 +2297,12 @@
+   EXPECT_TRUE(CheckPermissions(fs::all_perms));
+ #endif // !FreeBSD && !NetBSD && !OpenBSD && !AIX
+ 
++#if !defined(__APPLE__)
+   EXPECT_EQ(fs::setPermissions(TempPath, fs::all_perms & ~fs::sticky_bit),
+                                NoError);
+   EXPECT_TRUE(CheckPermissions(fs::all_perms & ~fs::sticky_bit));
+ #endif
++#endif
+ }
+ 
+ #ifdef _WIN32
diff --git a/py-llvmlite/files/llvm14-svml.patch b/py-llvmlite/files/llvm14-svml.patch
new file mode 100644
index 0000000000..c753d3f597
--- /dev/null
+++ b/py-llvmlite/files/llvm14-svml.patch
@@ -0,0 +1,2194 @@
+From 9de32f5474f1f78990b399214bdbb6c21f8f098e Mon Sep 17 00:00:00 2001
+From: Ivan Butygin <ivan.butygin%gmail.com@localhost>
+Date: Sun, 24 Jul 2022 20:31:29 +0200
+Subject: [PATCH] Fixes vectorizer and extends SVML support
+
+Fixes vectorizer and extends SVML support
+Patch was updated to fix SVML calling convention issues uncovered by llvm 10.
+In previous versions of patch SVML calling convention was selected based on
+compilation settings. So if you try to call 256bit vector function from avx512
+code function will be called with avx512 cc which is incorrect. To fix this
+SVML cc was separated into 3 different cc for 128, 256 and 512bit vector lengths
+which are selected based on actual input vector length.
+
+Original patch merged several fixes:
+
+1. https://reviews.llvm.org/D47188 patch fixes the problem with improper calls
+to SVML library as it has non-standard calling conventions. So accordingly it
+has SVML calling conventions definitions and code to set CC to the vectorized
+calls. As SVML provides several implementations for the math functions we also
+took into consideration fast attribute and select more fast implementation in
+such case. This work is based on original Matt Masten's work.
+Author: Denis Nagorny
+
+2. https://reviews.llvm.org/D53035 patch implements support to legalize SVML
+calls by breaking down the illegal vector call instruction into multiple legal
+vector call instructions during code generation. Currently the vectorizer does
+not check legality of the generated SVML (or any VECLIB) call instructions, and
+this can lead to potential problems even during vector type legalization. This
+patch addresses this issue by adding a legality check during code generation and
+replaces the illegal SVML call with corresponding legalized instructions.
+(RFC: http://lists.llvm.org/pipermail/llvm-dev/2018-June/124357.html)
+Author: Karthik Senthil
+
+diff --git a/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h b/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h
+index 17d1e3f770c14..110ff08189867 100644
+--- a/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h
++++ b/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h
+@@ -39,6 +39,12 @@ struct VecDesc {
+     NotLibFunc
+   };
+ 
++enum SVMLAccuracy {
++  SVML_DEFAULT,
++  SVML_HA,
++  SVML_EP
++};
++
+ /// Implementation of the target library information.
+ ///
+ /// This class constructs tables that hold the target library information and
+@@ -157,7 +163,7 @@ class TargetLibraryInfoImpl {
+   /// Return true if the function F has a vector equivalent with vectorization
+   /// factor VF.
+   bool isFunctionVectorizable(StringRef F, const ElementCount &VF) const {
+-    return !getVectorizedFunction(F, VF).empty();
++    return !getVectorizedFunction(F, VF, false).empty();
+   }
+ 
+   /// Return true if the function F has a vector equivalent with any
+@@ -166,7 +172,10 @@ class TargetLibraryInfoImpl {
+ 
+   /// Return the name of the equivalent of F, vectorized with factor VF. If no
+   /// such mapping exists, return the empty string.
+-  StringRef getVectorizedFunction(StringRef F, const ElementCount &VF) const;
++  std::string getVectorizedFunction(StringRef F, const ElementCount &VF, bool IsFast) const;
++
++  Optional<CallingConv::ID> getVectorizedFunctionCallingConv(
++    StringRef F, const FunctionType &FTy, const DataLayout &DL) const;
+ 
+   /// Set to true iff i32 parameters to library functions should have signext
+   /// or zeroext attributes if they correspond to C-level int or unsigned int,
+@@ -326,8 +335,13 @@ class TargetLibraryInfo {
+   bool isFunctionVectorizable(StringRef F) const {
+     return Impl->isFunctionVectorizable(F);
+   }
+-  StringRef getVectorizedFunction(StringRef F, const ElementCount &VF) const {
+-    return Impl->getVectorizedFunction(F, VF);
++  std::string getVectorizedFunction(StringRef F, const ElementCount &VF, bool IsFast) const {
++    return Impl->getVectorizedFunction(F, VF, IsFast);
++  }
++
++  Optional<CallingConv::ID> getVectorizedFunctionCallingConv(
++    StringRef F, const FunctionType &FTy, const DataLayout &DL) const {
++    return Impl->getVectorizedFunctionCallingConv(F, FTy, DL);
+   }
+ 
+   /// Tests if the function is both available and a candidate for optimized code
+diff --git a/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h b/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h
+index 78ebb35e0ea4d..3ffb57db8b18b 100644
+--- a/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h
++++ b/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h
+@@ -133,6 +133,9 @@ enum Kind {
+   kw_fastcc,
+   kw_coldcc,
+   kw_intel_ocl_bicc,
++  kw_intel_svmlcc128,
++  kw_intel_svmlcc256,
++  kw_intel_svmlcc512,
+   kw_cfguard_checkcc,
+   kw_x86_stdcallcc,
+   kw_x86_fastcallcc,
+diff --git a/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt b/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt
+index 0498fc269b634..23bb3de41bc1a 100644
+--- a/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt
++++ b/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt
+@@ -20,3 +20,7 @@ tablegen(LLVM IntrinsicsX86.h -gen-intrinsic-enums -intrinsic-prefix=x86)
+ tablegen(LLVM IntrinsicsXCore.h -gen-intrinsic-enums -intrinsic-prefix=xcore)
+ tablegen(LLVM IntrinsicsVE.h -gen-intrinsic-enums -intrinsic-prefix=ve)
+ add_public_tablegen_target(intrinsics_gen)
++
++set(LLVM_TARGET_DEFINITIONS SVML.td)
++tablegen(LLVM SVML.inc -gen-svml)
++add_public_tablegen_target(svml_gen)
+diff --git a/llvm-14.0.6.src/include/llvm/IR/CallingConv.h b/llvm-14.0.6.src/include/llvm/IR/CallingConv.h
+index fd28542465225..096eea1a8e19b 100644
+--- a/llvm-14.0.6.src/include/llvm/IR/CallingConv.h
++++ b/llvm-14.0.6.src/include/llvm/IR/CallingConv.h
+@@ -252,6 +252,11 @@ namespace CallingConv {
+     /// M68k_INTR - Calling convention used for M68k interrupt routines.
+     M68k_INTR = 101,
+ 
++    /// Intel_SVML - Calling conventions for Intel Short Math Vector Library
++    Intel_SVML128 = 102,
++    Intel_SVML256 = 103,
++    Intel_SVML512 = 104,
++
+     /// The highest possible calling convention ID. Must be some 2^k - 1.
+     MaxID = 1023
+   };
+diff --git a/llvm-14.0.6.src/include/llvm/IR/SVML.td b/llvm-14.0.6.src/include/llvm/IR/SVML.td
+new file mode 100644
+index 0000000000000..5af710404c9d9
+--- /dev/null
++++ b/llvm-14.0.6.src/include/llvm/IR/SVML.td
+@@ -0,0 +1,62 @@
++//===-- Intel_SVML.td - Defines SVML call variants ---------*- tablegen -*-===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++// This file is used by TableGen to define the different typs of SVML function
++// variants used with -fveclib=SVML.
++//
++//===----------------------------------------------------------------------===//
++
++class SvmlVariant;
++
++def sin        : SvmlVariant;
++def cos        : SvmlVariant;
++def pow        : SvmlVariant;
++def exp        : SvmlVariant;
++def log        : SvmlVariant;
++def acos       : SvmlVariant;
++def acosh      : SvmlVariant;
++def asin       : SvmlVariant;
++def asinh      : SvmlVariant;
++def atan2      : SvmlVariant;
++def atan       : SvmlVariant;
++def atanh      : SvmlVariant;
++def cbrt       : SvmlVariant;
++def cdfnorm    : SvmlVariant;
++def cdfnorminv : SvmlVariant;
++def cosd       : SvmlVariant;
++def cosh       : SvmlVariant;
++def erf        : SvmlVariant;
++def erfc       : SvmlVariant;
++def erfcinv    : SvmlVariant;
++def erfinv     : SvmlVariant;
++def exp10      : SvmlVariant;
++def exp2       : SvmlVariant;
++def expm1      : SvmlVariant;
++def hypot      : SvmlVariant;
++def invsqrt    : SvmlVariant;
++def log10      : SvmlVariant;
++def log1p      : SvmlVariant;
++def log2       : SvmlVariant;
++def sind       : SvmlVariant;
++def sinh       : SvmlVariant;
++def sqrt       : SvmlVariant;
++def tan        : SvmlVariant;
++def tanh       : SvmlVariant;
++
++// TODO: SVML does not currently provide _ha and _ep variants of these fucnctions.
++// We should call the default variant of these functions in all cases instead.
++
++// def nearbyint  : SvmlVariant;
++// def logb       : SvmlVariant;
++// def floor      : SvmlVariant;
++// def fmod       : SvmlVariant;
++// def ceil       : SvmlVariant;
++// def trunc      : SvmlVariant;
++// def rint       : SvmlVariant;
++// def round      : SvmlVariant;
+diff --git a/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt b/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt
+index aec84124129f4..98286e166fbe2 100644
+--- a/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt
++++ b/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt
+@@ -150,6 +150,7 @@ add_llvm_component_library(LLVMAnalysis
+   DEPENDS
+   intrinsics_gen
+   ${MLDeps}
++  svml_gen
+ 
+   LINK_LIBS
+   ${MLLinkDeps}
+diff --git a/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp b/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp
+index 02923c2c7eb14..83abde28a62a4 100644
+--- a/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp
++++ b/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp
+@@ -110,6 +110,11 @@ bool TargetLibraryInfoImpl::isCallingConvCCompatible(Function *F) {
+                                     F->getFunctionType());
+ }
+ 
++static std::string svmlMangle(StringRef FnName, const bool IsFast) {
++  std::string FullName = FnName.str();
++  return IsFast ? FullName : FullName + "_ha";
++}
++
+ /// Initialize the set of available library functions based on the specified
+ /// target triple. This should be carefully written so that a missing target
+ /// triple gets a sane set of defaults.
+@@ -1876,8 +1881,9 @@ void TargetLibraryInfoImpl::addVectorizableFunctionsFromVecLib(
+   }
+   case SVML: {
+     const VecDesc VecFuncs[] = {
+-    #define TLI_DEFINE_SVML_VECFUNCS
+-    #include "llvm/Analysis/VecFuncs.def"
++    #define GET_SVML_VARIANTS
++    #include "llvm/IR/SVML.inc"
++    #undef GET_SVML_VARIANTS
+     };
+     addVectorizableFunctions(VecFuncs);
+     break;
+@@ -1897,20 +1903,51 @@ bool TargetLibraryInfoImpl::isFunctionVectorizable(StringRef funcName) const {
+   return I != VectorDescs.end() && StringRef(I->ScalarFnName) == funcName;
+ }
+ 
+-StringRef
+-TargetLibraryInfoImpl::getVectorizedFunction(StringRef F,
+-                                             const ElementCount &VF) const {
++std::string TargetLibraryInfoImpl::getVectorizedFunction(StringRef F,
++                                                         const ElementCount &VF,
++                                                         bool IsFast) const {
++  bool FromSVML = ClVectorLibrary == SVML;
+   F = sanitizeFunctionName(F);
+   if (F.empty())
+-    return F;
++    return F.str();
+   std::vector<VecDesc>::const_iterator I =
+       llvm::lower_bound(VectorDescs, F, compareWithScalarFnName);
+   while (I != VectorDescs.end() && StringRef(I->ScalarFnName) == F) {
+-    if (I->VectorizationFactor == VF)
+-      return I->VectorFnName;
++    if (I->VectorizationFactor == VF) {
++      if (FromSVML) {
++        return svmlMangle(I->VectorFnName, IsFast);
++      }
++      return I->VectorFnName.str();
++    }
+     ++I;
+   }
+-  return StringRef();
++  return std::string();
++}
++
++static CallingConv::ID getSVMLCallingConv(const DataLayout &DL, const FunctionType &FType)
++{
++  assert(isa<VectorType>(FType.getReturnType()));
++  auto *VecCallRetType = cast<VectorType>(FType.getReturnType());
++  auto TypeBitWidth = DL.getTypeSizeInBits(VecCallRetType);
++  if (TypeBitWidth == 128) {
++    return CallingConv::Intel_SVML128;
++  } else if (TypeBitWidth == 256) {
++    return CallingConv::Intel_SVML256;
++  } else if (TypeBitWidth == 512) {
++    return CallingConv::Intel_SVML512;
++  } else {
++    llvm_unreachable("Invalid vector width");
++  }
++  return 0; // not reachable
++}
++
++Optional<CallingConv::ID>
++TargetLibraryInfoImpl::getVectorizedFunctionCallingConv(
++    StringRef F, const FunctionType &FTy, const DataLayout &DL) const {
++  if (F.startswith("__svml")) {
++    return getSVMLCallingConv(DL, FTy);
++  }
++  return {};
+ }
+ 
+ TargetLibraryInfo TargetLibraryAnalysis::run(const Function &F,
+diff --git a/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp b/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp
+index e3bf41c9721b6..4f9dccd4e0724 100644
+--- a/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp
++++ b/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp
+@@ -603,6 +603,9 @@ lltok::Kind LLLexer::LexIdentifier() {
+   KEYWORD(spir_kernel);
+   KEYWORD(spir_func);
+   KEYWORD(intel_ocl_bicc);
++  KEYWORD(intel_svmlcc128);
++  KEYWORD(intel_svmlcc256);
++  KEYWORD(intel_svmlcc512);
+   KEYWORD(x86_64_sysvcc);
+   KEYWORD(win64cc);
+   KEYWORD(x86_regcallcc);
+diff --git a/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp b/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp
+index 432ec151cf8ae..3bd6ee61024b8 100644
+--- a/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp
++++ b/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp
+@@ -1781,6 +1781,9 @@ void LLParser::parseOptionalDLLStorageClass(unsigned &Res) {
+ ///   ::= 'ccc'
+ ///   ::= 'fastcc'
+ ///   ::= 'intel_ocl_bicc'
++///   ::= 'intel_svmlcc128'
++///   ::= 'intel_svmlcc256'
++///   ::= 'intel_svmlcc512'
+ ///   ::= 'coldcc'
+ ///   ::= 'cfguard_checkcc'
+ ///   ::= 'x86_stdcallcc'
+@@ -1850,6 +1853,9 @@ bool LLParser::parseOptionalCallingConv(unsigned &CC) {
+   case lltok::kw_spir_kernel:    CC = CallingConv::SPIR_KERNEL; break;
+   case lltok::kw_spir_func:      CC = CallingConv::SPIR_FUNC; break;
+   case lltok::kw_intel_ocl_bicc: CC = CallingConv::Intel_OCL_BI; break;
++  case lltok::kw_intel_svmlcc128:CC = CallingConv::Intel_SVML128; break;
++  case lltok::kw_intel_svmlcc256:CC = CallingConv::Intel_SVML256; break;
++  case lltok::kw_intel_svmlcc512:CC = CallingConv::Intel_SVML512; break;
+   case lltok::kw_x86_64_sysvcc:  CC = CallingConv::X86_64_SysV; break;
+   case lltok::kw_win64cc:        CC = CallingConv::Win64; break;
+   case lltok::kw_webkit_jscc:    CC = CallingConv::WebKit_JS; break;
+diff --git a/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp b/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp
+index 0ff045fa787e8..175651949ef85 100644
+--- a/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp
++++ b/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp
+@@ -157,7 +157,7 @@ static bool replaceWithCallToVeclib(const TargetLibraryInfo &TLI,
+   // and the exact vector width of the call operands in the
+   // TargetLibraryInfo.
+   const std::string TLIName =
+-      std::string(TLI.getVectorizedFunction(ScalarName, VF));
++      std::string(TLI.getVectorizedFunction(ScalarName, VF, CI.getFastMathFlags().isFast()));
+ 
+   LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Looking up TLI mapping for `"
+                     << ScalarName << "` and vector width " << VF << ".\n");
+diff --git a/llvm-14.0.6.src/lib/IR/AsmWriter.cpp b/llvm-14.0.6.src/lib/IR/AsmWriter.cpp
+index 179754e275b03..c4e95752c97e8 100644
+--- a/llvm-14.0.6.src/lib/IR/AsmWriter.cpp
++++ b/llvm-14.0.6.src/lib/IR/AsmWriter.cpp
+@@ -306,6 +306,9 @@ static void PrintCallingConv(unsigned cc, raw_ostream &Out) {
+   case CallingConv::X86_RegCall:   Out << "x86_regcallcc"; break;
+   case CallingConv::X86_VectorCall:Out << "x86_vectorcallcc"; break;
+   case CallingConv::Intel_OCL_BI:  Out << "intel_ocl_bicc"; break;
++  case CallingConv::Intel_SVML128: Out << "intel_svmlcc128"; break;
++  case CallingConv::Intel_SVML256: Out << "intel_svmlcc256"; break;
++  case CallingConv::Intel_SVML512: Out << "intel_svmlcc512"; break;
+   case CallingConv::ARM_APCS:      Out << "arm_apcscc"; break;
+   case CallingConv::ARM_AAPCS:     Out << "arm_aapcscc"; break;
+   case CallingConv::ARM_AAPCS_VFP: Out << "arm_aapcs_vfpcc"; break;
+diff --git a/llvm-14.0.6.src/lib/IR/Verifier.cpp b/llvm-14.0.6.src/lib/IR/Verifier.cpp
+index 989d01e2e3950..bae7382a36e13 100644
+--- a/llvm-14.0.6.src/lib/IR/Verifier.cpp
++++ b/llvm-14.0.6.src/lib/IR/Verifier.cpp
+@@ -2457,6 +2457,9 @@ void Verifier::visitFunction(const Function &F) {
+   case CallingConv::Fast:
+   case CallingConv::Cold:
+   case CallingConv::Intel_OCL_BI:
++  case CallingConv::Intel_SVML128:
++  case CallingConv::Intel_SVML256:
++  case CallingConv::Intel_SVML512:
+   case CallingConv::PTX_Kernel:
+   case CallingConv::PTX_Device:
+     Assert(!F.isVarArg(), "Calling convention does not support varargs or "
+diff --git a/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td b/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td
+index 4dd8a6cdd8982..12e65521215e4 100644
+--- a/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td
++++ b/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td
+@@ -498,6 +498,21 @@ def RetCC_X86_64 : CallingConv<[
+   CCDelegateTo<RetCC_X86_64_C>
+ ]>;
+ 
++// Intel_SVML return-value convention.
++def RetCC_Intel_SVML : CallingConv<[
++  // Vector types are returned in XMM0,XMM1
++  CCIfType<[v4f32, v2f64],
++            CCAssignToReg<[XMM0,XMM1]>>,
++
++  // 256-bit FP vectors
++  CCIfType<[v8f32, v4f64],
++            CCAssignToReg<[YMM0,YMM1]>>,
++
++  // 512-bit FP vectors
++  CCIfType<[v16f32, v8f64],
++            CCAssignToReg<[ZMM0,ZMM1]>>
++]>;
++
+ // This is the return-value convention used for the entire X86 backend.
+ let Entry = 1 in
+ def RetCC_X86 : CallingConv<[
+@@ -505,6 +520,10 @@ def RetCC_X86 : CallingConv<[
+   // Check if this is the Intel OpenCL built-ins calling convention
+   CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo<RetCC_Intel_OCL_BI>>,
+ 
++  CCIfCC<"CallingConv::Intel_SVML128", CCDelegateTo<RetCC_Intel_SVML>>,
++  CCIfCC<"CallingConv::Intel_SVML256", CCDelegateTo<RetCC_Intel_SVML>>,
++  CCIfCC<"CallingConv::Intel_SVML512", CCDelegateTo<RetCC_Intel_SVML>>,
++
+   CCIfSubtarget<"is64Bit()", CCDelegateTo<RetCC_X86_64>>,
+   CCDelegateTo<RetCC_X86_32>
+ ]>;
+@@ -1064,6 +1083,30 @@ def CC_Intel_OCL_BI : CallingConv<[
+   CCDelegateTo<CC_X86_32_C>
+ ]>;
+ 
++// X86-64 Intel Short Vector Math Library calling convention.
++def CC_Intel_SVML : CallingConv<[
++
++  // The SSE vector arguments are passed in XMM registers.
++  CCIfType<[v4f32, v2f64],
++           CCAssignToReg<[XMM0, XMM1, XMM2]>>,
++
++  // The 256-bit vector arguments are passed in YMM registers.
++  CCIfType<[v8f32, v4f64],
++           CCAssignToReg<[YMM0, YMM1, YMM2]>>,
++
++  // The 512-bit vector arguments are passed in ZMM registers.
++  CCIfType<[v16f32, v8f64],
++           CCAssignToReg<[ZMM0, ZMM1, ZMM2]>>
++]>;
++
++def CC_X86_32_Intr : CallingConv<[
++  CCAssignToStack<4, 4>
++]>;
++
++def CC_X86_64_Intr : CallingConv<[
++  CCAssignToStack<8, 8>
++]>;
++
+ //===----------------------------------------------------------------------===//
+ // X86 Root Argument Calling Conventions
+ //===----------------------------------------------------------------------===//
+@@ -1115,6 +1158,9 @@ def CC_X86_64 : CallingConv<[
+ let Entry = 1 in
+ def CC_X86 : CallingConv<[
+   CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo<CC_Intel_OCL_BI>>,
++  CCIfCC<"CallingConv::Intel_SVML128", CCDelegateTo<CC_Intel_SVML>>,
++  CCIfCC<"CallingConv::Intel_SVML256", CCDelegateTo<CC_Intel_SVML>>,
++  CCIfCC<"CallingConv::Intel_SVML512", CCDelegateTo<CC_Intel_SVML>>,
+   CCIfSubtarget<"is64Bit()", CCDelegateTo<CC_X86_64>>,
+   CCDelegateTo<CC_X86_32>
+ ]>;
+@@ -1227,3 +1273,27 @@ def CSR_SysV64_RegCall_NoSSE : CalleeSavedRegs<(add RBX, RBP,
+                                                (sequence "R%u", 12, 15))>;
+ def CSR_SysV64_RegCall       : CalleeSavedRegs<(add CSR_SysV64_RegCall_NoSSE,               
+                                                (sequence "XMM%u", 8, 15))>;
++
++// SVML calling convention
++def CSR_32_Intel_SVML        : CalleeSavedRegs<(add CSR_32_RegCall_NoSSE)>;
++def CSR_32_Intel_SVML_AVX512 : CalleeSavedRegs<(add CSR_32_Intel_SVML,
++                                                K4, K5, K6, K7)>;
++
++def CSR_64_Intel_SVML_NoSSE : CalleeSavedRegs<(add RBX, RSI, RDI, RBP, RSP, R12, R13, R14, R15)>;
++
++def CSR_64_Intel_SVML       : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
++                                               (sequence "XMM%u", 8, 15))>;
++def CSR_Win64_Intel_SVML    : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
++                                               (sequence "XMM%u", 6, 15))>;
++
++def CSR_64_Intel_SVML_AVX        : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
++                                                    (sequence "YMM%u", 8, 15))>;
++def CSR_Win64_Intel_SVML_AVX     : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
++                                                    (sequence "YMM%u", 6, 15))>;
++
++def CSR_64_Intel_SVML_AVX512     : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
++                                                    (sequence "ZMM%u", 16, 31),
++                                                    K4, K5, K6, K7)>;
++def CSR_Win64_Intel_SVML_AVX512  : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
++                                                    (sequence "ZMM%u", 6, 21),
++                                                    K4, K5, K6, K7)>;
+diff --git a/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp b/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp
+index 8bb7e81e19bbd..1780ce3fc6467 100644
+--- a/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp
++++ b/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp
+@@ -3788,7 +3788,8 @@ void VarArgsLoweringHelper::forwardMustTailParameters(SDValue &Chain) {
+   // FIXME: Only some x86_32 calling conventions support AVX512.
+   if (Subtarget.useAVX512Regs() &&
+       (is64Bit() || (CallConv == CallingConv::X86_VectorCall ||
+-                     CallConv == CallingConv::Intel_OCL_BI)))
++                     CallConv == CallingConv::Intel_OCL_BI   ||
++                     CallConv == CallingConv::Intel_SVML512)))
+     VecVT = MVT::v16f32;
+   else if (Subtarget.hasAVX())
+     VecVT = MVT::v8f32;
+diff --git a/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp b/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp
+index 130cb61cdde24..9eec3b25ca9f2 100644
+--- a/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp
++++ b/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp
+@@ -272,6 +272,42 @@ X86RegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
+   }
+ }
+ 
++namespace {
++std::pair<const uint32_t *, const MCPhysReg *> getSVMLRegMaskAndSaveList(
++  bool Is64Bit, bool IsWin64, CallingConv::ID CC) {
++  assert(CC >= CallingConv::Intel_SVML128 && CC <= CallingConv::Intel_SVML512);
++  unsigned Abi = CC - CallingConv::Intel_SVML128 ; // 0 - 128, 1 - 256, 2 - 512
++
++  const std::pair<const uint32_t *, const MCPhysReg *> Abi64[] = {
++    std::make_pair(CSR_64_Intel_SVML_RegMask,        CSR_64_Intel_SVML_SaveList),
++    std::make_pair(CSR_64_Intel_SVML_AVX_RegMask,    CSR_64_Intel_SVML_AVX_SaveList),
++    std::make_pair(CSR_64_Intel_SVML_AVX512_RegMask, CSR_64_Intel_SVML_AVX512_SaveList),
++  };
++
++  const std::pair<const uint32_t *, const MCPhysReg *> AbiWin64[] = {
++    std::make_pair(CSR_Win64_Intel_SVML_RegMask,        CSR_Win64_Intel_SVML_SaveList),
++    std::make_pair(CSR_Win64_Intel_SVML_AVX_RegMask,    CSR_Win64_Intel_SVML_AVX_SaveList),
++    std::make_pair(CSR_Win64_Intel_SVML_AVX512_RegMask, CSR_Win64_Intel_SVML_AVX512_SaveList),
++  };
++
++  const std::pair<const uint32_t *, const MCPhysReg *> Abi32[] = {
++    std::make_pair(CSR_32_Intel_SVML_RegMask,        CSR_32_Intel_SVML_SaveList),
++    std::make_pair(CSR_32_Intel_SVML_RegMask,        CSR_32_Intel_SVML_SaveList),
++    std::make_pair(CSR_32_Intel_SVML_AVX512_RegMask, CSR_32_Intel_SVML_AVX512_SaveList),
++  };
++
++  if (Is64Bit) {
++    if (IsWin64) {
++      return AbiWin64[Abi];
++    } else {
++      return Abi64[Abi];
++    }
++  } else {
++    return Abi32[Abi];
++  }
++}
++}
++
+ const MCPhysReg *
+ X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
+   assert(MF && "MachineFunction required");
+@@ -327,6 +363,11 @@ X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
+       return CSR_64_Intel_OCL_BI_SaveList;
+     break;
+   }
++  case CallingConv::Intel_SVML128:
++  case CallingConv::Intel_SVML256:
++  case CallingConv::Intel_SVML512: {
++    return getSVMLRegMaskAndSaveList(Is64Bit, IsWin64, CC).second;
++  }
+   case CallingConv::HHVM:
+     return CSR_64_HHVM_SaveList;
+   case CallingConv::X86_RegCall:
+@@ -449,6 +490,11 @@ X86RegisterInfo::getCallPreservedMask(const MachineFunction &MF,
+       return CSR_64_Intel_OCL_BI_RegMask;
+     break;
+   }
++  case CallingConv::Intel_SVML128:
++  case CallingConv::Intel_SVML256:
++  case CallingConv::Intel_SVML512: {
++    return getSVMLRegMaskAndSaveList(Is64Bit, IsWin64, CC).first;
++  }
+   case CallingConv::HHVM:
+     return CSR_64_HHVM_RegMask;
+   case CallingConv::X86_RegCall:
+diff --git a/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h b/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h
+index 5d773f0c57dfb..6bdf5bc6f3fe9 100644
+--- a/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h
++++ b/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h
+@@ -916,6 +916,9 @@ class X86Subtarget final : public X86GenSubtargetInfo {
+     case CallingConv::X86_ThisCall:
+     case CallingConv::X86_VectorCall:
+     case CallingConv::Intel_OCL_BI:
++    case CallingConv::Intel_SVML128:
++    case CallingConv::Intel_SVML256:
++    case CallingConv::Intel_SVML512:
+       return isTargetWin64();
+     // This convention allows using the Win64 convention on other targets.
+     case CallingConv::Win64:
+diff --git a/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp b/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp
+index 047bf5569ded3..59897785f156c 100644
+--- a/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp
++++ b/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp
+@@ -92,7 +92,7 @@ static void addMappingsFromTLI(const TargetLibraryInfo &TLI, CallInst &CI) {
+ 
+   auto AddVariantDecl = [&](const ElementCount &VF) {
+     const std::string TLIName =
+-        std::string(TLI.getVectorizedFunction(ScalarName, VF));
++        std::string(TLI.getVectorizedFunction(ScalarName, VF, CI.getFastMathFlags().isFast()));
+     if (!TLIName.empty()) {
+       std::string MangledName =
+           VFABI::mangleTLIVectorName(TLIName, ScalarName, CI.arg_size(), VF);
+diff --git a/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp
+index 46ff0994e04e7..f472af5e1a835 100644
+--- a/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp
++++ b/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp
+@@ -712,6 +712,27 @@ class InnerLoopVectorizer {
+   virtual void printDebugTracesAtStart(){};
+   virtual void printDebugTracesAtEnd(){};
+ 
++  /// Check legality of given SVML call instruction \p VecCall generated for
++  /// scalar call \p Call. If illegal then the appropriate legal instruction
++  /// is returned.
++  Value *legalizeSVMLCall(CallInst *VecCall, CallInst *Call);
++
++  /// Returns the legal VF for a call instruction \p CI using TTI information
++  /// and vector type.
++  ElementCount getLegalVFForCall(CallInst *CI);
++
++  /// Partially vectorize a given call \p Call by breaking it down into multiple
++  /// calls of \p LegalCall, decided by the variant VF \p LegalVF.
++  Value *partialVectorizeCall(CallInst *Call, CallInst *LegalCall,
++                              unsigned LegalVF);
++
++  /// Generate shufflevector instruction for a vector value \p V based on the
++  /// current \p Part and a smaller VF \p LegalVF.
++  Value *generateShuffleValue(Value *V, unsigned LegalVF, unsigned Part);
++
++  /// Combine partially vectorized calls stored in \p CallResults.
++  Value *combinePartialVecCalls(SmallVectorImpl<Value *> &CallResults);
++
+   /// The original loop.
+   Loop *OrigLoop;
+ 
+@@ -4596,6 +4617,17 @@ static bool mayDivideByZero(Instruction &I) {
+   return !CInt || CInt->isZero();
+ }
+ 
++static void setVectorFunctionCallingConv(CallInst &CI, const DataLayout &DL,
++                                         const TargetLibraryInfo &TLI) {
++  Function *VectorF = CI.getCalledFunction();
++  FunctionType *FTy = VectorF->getFunctionType();
++  StringRef VFName = VectorF->getName();
++  auto CC = TLI.getVectorizedFunctionCallingConv(VFName, *FTy, DL);
++  if (CC) {
++    CI.setCallingConv(*CC);
++  }
++}
++
+ void InnerLoopVectorizer::widenCallInstruction(CallInst &I, VPValue *Def,
+                                                VPUser &ArgOperands,
+                                                VPTransformState &State) {
+@@ -4664,9 +4696,246 @@ void InnerLoopVectorizer::widenCallInstruction(CallInst &I, VPValue *Def,
+       if (isa<FPMathOperator>(V))
+         V->copyFastMathFlags(CI);
+ 
++    const DataLayout &DL = V->getModule()->getDataLayout();
++    setVectorFunctionCallingConv(*V, DL, *TLI);
++
++    // Perform legalization of SVML call instruction only if original call
++    // was not Intrinsic
++    if (!UseVectorIntrinsic &&
++        (V->getCalledFunction()->getName()).startswith("__svml")) {
++      // assert((V->getCalledFunction()->getName()).startswith("__svml"));
++      LLVM_DEBUG(dbgs() << "LV(SVML): Vector call inst:"; V->dump());
++      auto *LegalV = cast<Instruction>(legalizeSVMLCall(V, CI));
++      LLVM_DEBUG(dbgs() << "LV: Completed SVML legalization.\n LegalV: ";
++                 LegalV->dump());
++      State.set(Def, LegalV, Part);
++      addMetadata(LegalV, &I);
++    } else {
+       State.set(Def, V, Part);
+       addMetadata(V, &I);
++    }
++  }
++}
++
++//===----------------------------------------------------------------------===//
++// Implementation of functions for SVML vector call legalization.
++//===----------------------------------------------------------------------===//
++//
++// Unlike other VECLIBs, SVML needs to be used with target-legal
++// vector types. Otherwise, link failures and/or runtime failures
++// will occur. A motivating example could be -
++//
++//   double *a;
++//   float *b;
++//   #pragma clang loop vectorize_width(8)
++//   for(i = 0; i < N; ++i) {
++//     a[i] = sin(i);   // Legal SVML VF must be 4 or below on AVX
++//     b[i] = cosf(i);  // VF can be 8 on AVX since 8 floats can fit in YMM
++//    }
++//
++// Current implementation of vector code generation in LV is
++// driven based on a single VF (in InnerLoopVectorizer::VF). This
++// inhibits the flexibility of adjusting/choosing different VF
++// for different instructions.
++//
++// Due to this limitation it is much more straightforward to
++// first generate the illegal sin8 (svml_sin8 for SVML vector
++// library) call and then legalize it than trying to avoid
++// generating illegal code from the beginning.
++//
++// A solution for this problem is to check legality of the
++// call instruction right after generating it in vectorizer and
++// if it is illegal we split the call arguments and issue multiple
++// calls to match the legal VF. This is demonstrated currently for
++// the SVML vector library calls (non-intrinsic version only).
++//
++// Future directions and extensions:
++// 1) This legalization example shows us that a good direction
++//    for the VPlan framework would be to model the vector call
++//    instructions in a way that legal VF for each call is chosen
++//    correctly within vectorizer and illegal code generation is
++//    avoided.
++// 2) This logic can also be extended to general vector functions
++//    i.e. legalization OpenMP decalre simd functions. The
++//    requirements needed for this will be documented soon.
++
++Value *InnerLoopVectorizer::legalizeSVMLCall(CallInst *VecCall,
++                                             CallInst *Call) {
++  ElementCount LegalVF = getLegalVFForCall(VecCall);
++
++  assert(LegalVF.getKnownMinValue() > 1 &&
++         "Legal VF for SVML call must be greater than 1 to vectorize");
++
++  if (LegalVF == VF)
++    return VecCall;
++  else if (LegalVF.getKnownMinValue() > VF.getKnownMinValue())
++    // TODO: handle case when we are underfilling vectors
++    return VecCall;
++
++  // Legal VF for this SVML call is smaller than chosen VF, break it down into
++  // smaller call instructions
++
++  // Convert args, types and return type to match legal VF
++  SmallVector<Type *, 4> NewTys;
++  SmallVector<Value *, 4> NewArgs;
++
++  for (Value *ArgOperand : Call->args()) {
++    Type *Ty = ToVectorTy(ArgOperand->getType(), LegalVF);
++    NewTys.push_back(Ty);
++    NewArgs.push_back(UndefValue::get(Ty));
+   }
++
++  // Construct legal vector function
++  const VFShape Shape =
++    VFShape::get(*Call, LegalVF /*EC*/, false /*HasGlobalPred*/);
++  Function *LegalVectorF = VFDatabase(*Call).getVectorizedFunction(Shape);
++  assert(LegalVectorF != nullptr && "Can't create legal vector function.");
++
++  LLVM_DEBUG(dbgs() << "LV(SVML): LegalVectorF: "; LegalVectorF->dump());
++
++  SmallVector<OperandBundleDef, 1> OpBundles;
++  Call->getOperandBundlesAsDefs(OpBundles);
++  auto LegalV = std::unique_ptr<CallInst>(CallInst::Create(LegalVectorF, NewArgs, OpBundles));
++
++  if (isa<FPMathOperator>(LegalV))
++    LegalV->copyFastMathFlags(Call);
++
++  const DataLayout &DL = VecCall->getModule()->getDataLayout();
++  // Set SVML calling conventions
++  setVectorFunctionCallingConv(*LegalV, DL, *TLI);
++
++  LLVM_DEBUG(dbgs() << "LV(SVML): LegalV: "; LegalV->dump());
++
++  Value *LegalizedCall = partialVectorizeCall(VecCall, LegalV.get(), LegalVF.getKnownMinValue());
++
++  LLVM_DEBUG(dbgs() << "LV(SVML): LegalizedCall: "; LegalizedCall->dump());
++
++  // Remove the illegal call from Builder
++  VecCall->eraseFromParent();
++
++  return LegalizedCall;
++}
++
++ElementCount InnerLoopVectorizer::getLegalVFForCall(CallInst *CI) {
++  const DataLayout DL = CI->getModule()->getDataLayout();
++  FunctionType *CallFT = CI->getFunctionType();
++  // All functions that need legalization should have a vector return type.
++  // This is true for all SVML functions that are currently supported.
++  assert(isa<VectorType>(CallFT->getReturnType()) &&
++         "Return type of call that needs legalization is not a vector.");
++  auto *VecCallRetType = cast<VectorType>(CallFT->getReturnType());
++  Type *ElemType = VecCallRetType->getElementType();
++
++  unsigned TypeBitWidth = DL.getTypeSizeInBits(ElemType);
++  unsigned VectorBitWidth = TTI->getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector);
++  unsigned LegalVF = VectorBitWidth / TypeBitWidth;
++
++  LLVM_DEBUG(dbgs() << "LV(SVML): Type Bit Width: " << TypeBitWidth << "\n");
++  LLVM_DEBUG(dbgs() << "LV(SVML): Current VL: " << VF << "\n");
++  LLVM_DEBUG(dbgs() << "LV(SVML): Vector Bit Width: " << VectorBitWidth
++                    << "\n");
++  LLVM_DEBUG(dbgs() << "LV(SVML): Legal Target VL: " << LegalVF << "\n");
++
++  return ElementCount::getFixed(LegalVF);
++}
++
++// Partial vectorization of a call instruction is achieved by making clones of
++// \p LegalCall and overwriting its argument operands with shufflevector
++// equivalent decided based on \p LegalVF and current Part being filled.
++Value *InnerLoopVectorizer::partialVectorizeCall(CallInst *Call,
++                                                 CallInst *LegalCall,
++                                                 unsigned LegalVF) {
++  unsigned NumParts = VF.getKnownMinValue() / LegalVF;
++  LLVM_DEBUG(dbgs() << "LV(SVML): NumParts: " << NumParts << "\n");
++  SmallVector<Value *, 8> CallResults;
++
++  for (unsigned Part = 0; Part < NumParts; ++Part) {
++    auto *ClonedCall = cast<CallInst>(LegalCall->clone());
++
++    // Update the arg operand of cloned call to shufflevector
++    for (unsigned i = 0, ie = Call->arg_size(); i != ie; ++i) {
++      auto *NewOp = generateShuffleValue(Call->getArgOperand(i), LegalVF, Part);
++      ClonedCall->setArgOperand(i, NewOp);
++    }
++
++    LLVM_DEBUG(dbgs() << "LV(SVML): ClonedCall: "; ClonedCall->dump());
++
++    auto *PartialVecCall = Builder.Insert(ClonedCall);
++    CallResults.push_back(PartialVecCall);
++  }
++
++  return combinePartialVecCalls(CallResults);
++}
++
++Value *InnerLoopVectorizer::generateShuffleValue(Value *V, unsigned LegalVF,
++                                                 unsigned Part) {
++  // Example:
++  // Consider the following vector code -
++  // %1 = sitofp <4 x i32> %0 to <4 x double>
++  // %2 = call <4 x double> @__svml_sin4(<4 x double> %1)
++  //
++  // If the LegalVF is 2, we partially vectorize the sin4 call by invoking
++  // generateShuffleValue on the operand %1
++  // If Part = 1, output value is -
++  // %shuffle = shufflevector <4 x double> %1, <4 x double> undef, <2 x i32><i32 0, i32 1>
++  // and if Part = 2, output is -
++  // %shuffle7 =shufflevector <4 x double> %1, <4 x double> undef, <2 x i32><i32 2, i32 3>
++
++  assert(isa<VectorType>(V->getType()) &&
++         "Cannot generate shuffles for non-vector values.");
++  SmallVector<int, 4> ShuffleMask;
++  Value *Undef = UndefValue::get(V->getType());
++
++  unsigned ElemIdx = Part * LegalVF;
++
++  for (unsigned K = 0; K < LegalVF; K++)
++    ShuffleMask.push_back(static_cast<int>(ElemIdx + K));
++
++  auto *ShuffleInst =
++      Builder.CreateShuffleVector(V, Undef, ShuffleMask, "shuffle");
++
++  return ShuffleInst;
++}
++
++// Results of the calls executed by smaller legal call instructions must be
++// combined to match the original VF for later use. This is done by constructing
++// shufflevector instructions in a cumulative fashion.
++Value *InnerLoopVectorizer::combinePartialVecCalls(
++    SmallVectorImpl<Value *> &CallResults) {
++  assert(isa<VectorType>(CallResults[0]->getType()) &&
++         "Cannot combine calls with non-vector results.");
++  auto *CallType = cast<VectorType>(CallResults[0]->getType());
++
++  Value *CombinedShuffle;
++  unsigned NumElems = CallType->getElementCount().getKnownMinValue() * 2;
++  unsigned NumRegs = CallResults.size();
++
++  assert(NumRegs >= 2 && isPowerOf2_32(NumRegs) &&
++         "Number of partial vector calls to combine must be a power of 2 "
++         "(atleast 2^1)");
++
++  while (NumRegs > 1) {
++    for (unsigned I = 0; I < NumRegs; I += 2) {
++      SmallVector<int, 4> ShuffleMask;
++      for (unsigned J = 0; J < NumElems; J++)
++        ShuffleMask.push_back(static_cast<int>(J));
++
++      CombinedShuffle = Builder.CreateShuffleVector(
++          CallResults[I], CallResults[I + 1], ShuffleMask, "combined");
++      LLVM_DEBUG(dbgs() << "LV(SVML): CombinedShuffle:";
++                 CombinedShuffle->dump());
++      CallResults.push_back(CombinedShuffle);
++    }
++
++    SmallVector<Value *, 2>::iterator Start = CallResults.begin();
++    SmallVector<Value *, 2>::iterator End = Start + NumRegs;
++    CallResults.erase(Start, End);
++
++    NumElems *= 2;
++    NumRegs /= 2;
++  }
++
++  return CombinedShuffle;
+ }
+ 
+ void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) {
+diff --git a/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp
+index 644372483edde..342f018b92184 100644
+--- a/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp
++++ b/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp
+@@ -6322,6 +6322,17 @@ Value *BoUpSLP::vectorizeTree(ArrayRef<Value *> VL) {
+   return Vec;
+ }
+ 
++static void setVectorFunctionCallingConv(CallInst &CI, const DataLayout &DL,
++                                         const TargetLibraryInfo &TLI) {
++  Function *VectorF = CI.getCalledFunction();
++  FunctionType *FTy = VectorF->getFunctionType();
++  StringRef VFName = VectorF->getName();
++  auto CC = TLI.getVectorizedFunctionCallingConv(VFName, *FTy, DL);
++  if (CC) {
++    CI.setCallingConv(*CC);
++  }
++}
++
+ Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
+   IRBuilder<>::InsertPointGuard Guard(Builder);
+ 
+@@ -6794,7 +6805,12 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E) {
+ 
+       SmallVector<OperandBundleDef, 1> OpBundles;
+       CI->getOperandBundlesAsDefs(OpBundles);
+-      Value *V = Builder.CreateCall(CF, OpVecs, OpBundles);
++
++      CallInst *NewCall = Builder.CreateCall(CF, OpVecs, OpBundles);
++      const DataLayout &DL = NewCall->getModule()->getDataLayout();
++      setVectorFunctionCallingConv(*NewCall, DL, *TLI);
++
++      Value *V = NewCall;
+ 
+       // The scalar argument uses an in-tree scalar so we add the new vectorized
+       // call to ExternalUses list to make sure that an extract will be
+diff --git a/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll b/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll
+index df8b7c498bd00..63a36549f18fd 100644
+--- a/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll
++++ b/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll
+@@ -10,7 +10,7 @@ target triple = "x86_64-unknown-linux-gnu"
+ define <4 x double> @exp_v4(<4 x double> %in) {
+ ; SVML-LABEL: define {{[^@]+}}@exp_v4
+ ; SVML-SAME: (<4 x double> [[IN:%.*]]) {
+-; SVML-NEXT:    [[TMP1:%.*]] = call <4 x double> @__svml_exp4(<4 x double> [[IN]])
++; SVML-NEXT:    [[TMP1:%.*]] = call <4 x double> @__svml_exp4_ha(<4 x double> [[IN]])
+ ; SVML-NEXT:    ret <4 x double> [[TMP1]]
+ ;
+ ; LIBMVEC-X86-LABEL: define {{[^@]+}}@exp_v4
+@@ -37,7 +37,7 @@ declare <4 x double> @llvm.exp.v4f64(<4 x double>) #0
+ define <4 x float> @exp_f32(<4 x float> %in) {
+ ; SVML-LABEL: define {{[^@]+}}@exp_f32
+ ; SVML-SAME: (<4 x float> [[IN:%.*]]) {
+-; SVML-NEXT:    [[TMP1:%.*]] = call <4 x float> @__svml_expf4(<4 x float> [[IN]])
++; SVML-NEXT:    [[TMP1:%.*]] = call <4 x float> @__svml_expf4_ha(<4 x float> [[IN]])
+ ; SVML-NEXT:    ret <4 x float> [[TMP1]]
+ ;
+ ; LIBMVEC-X86-LABEL: define {{[^@]+}}@exp_f32
+diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll
+index a6e191c3d6923..d6e2e11106949 100644
+--- a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll
++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll
+@@ -39,7 +39,8 @@ for.end:                                          ; preds = %for.body
+ declare double @__exp_finite(double) #0
+ 
+ ; CHECK-LABEL: @exp_f64
+-; CHECK: <4 x double> @__svml_exp4
++; CHECK: <2 x double> @__svml_exp2
++; CHECK: <2 x double> @__svml_exp2
+ ; CHECK: ret
+ define void @exp_f64(double* nocapture %varray) {
+ entry:
+@@ -99,7 +100,8 @@ for.end:                                          ; preds = %for.body
+ declare double @__log_finite(double) #0
+ 
+ ; CHECK-LABEL: @log_f64
+-; CHECK: <4 x double> @__svml_log4
++; CHECK: <2 x double> @__svml_log2
++; CHECK: <2 x double> @__svml_log2
+ ; CHECK: ret
+ define void @log_f64(double* nocapture %varray) {
+ entry:
+@@ -159,7 +161,8 @@ for.end:                                          ; preds = %for.body
+ declare double @__pow_finite(double, double) #0
+ 
+ ; CHECK-LABEL: @pow_f64
+-; CHECK: <4 x double> @__svml_pow4
++; CHECK: <2 x double> @__svml_pow2
++; CHECK: <2 x double> @__svml_pow2
+ ; CHECK: ret
+ define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {
+ entry:
+@@ -190,7 +193,8 @@ declare float @__exp2f_finite(float) #0
+ 
+ define void @exp2f_finite(float* nocapture %varray) {
+ ; CHECK-LABEL: @exp2f_finite(
+-; CHECK:    call <4 x float> @__svml_exp2f4(<4 x float> %{{.*}})
++; CHECK:    call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> %{{.*}})
++; CHECK:    call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> %{{.*}})
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -219,7 +223,8 @@ declare double @__exp2_finite(double) #0
+ 
+ define void @exp2_finite(double* nocapture %varray) {
+ ; CHECK-LABEL: @exp2_finite(
+-; CHECK:    call <4 x double> @__svml_exp24(<4 x double> {{.*}})
++; CHECK:    call intel_svmlcc128 <2 x double> @__svml_exp22_ha(<2 x double> {{.*}})
++; CHECK:    call intel_svmlcc128 <2 x double> @__svml_exp22_ha(<2 x double> {{.*}})
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -276,7 +281,8 @@ for.end:                                          ; preds = %for.body
+ declare double @__log2_finite(double) #0
+ 
+ ; CHECK-LABEL: @log2_f64
+-; CHECK: <4 x double> @__svml_log24
++; CHECK: <2 x double> @__svml_log22
++; CHECK: <2 x double> @__svml_log22
+ ; CHECK: ret
+ define void @log2_f64(double* nocapture %varray) {
+ entry:
+@@ -333,7 +339,8 @@ for.end:                                          ; preds = %for.body
+ declare double @__log10_finite(double) #0
+ 
+ ; CHECK-LABEL: @log10_f64
+-; CHECK: <4 x double> @__svml_log104
++; CHECK: <2 x double> @__svml_log102
++; CHECK: <2 x double> @__svml_log102
+ ; CHECK: ret
+ define void @log10_f64(double* nocapture %varray) {
+ entry:
+@@ -390,7 +397,8 @@ for.end:                                          ; preds = %for.body
+ declare double @__sqrt_finite(double) #0
+ 
+ ; CHECK-LABEL: @sqrt_f64
+-; CHECK: <4 x double> @__svml_sqrt4
++; CHECK: <2 x double> @__svml_sqrt2
++; CHECK: <2 x double> @__svml_sqrt2
+ ; CHECK: ret
+ define void @sqrt_f64(double* nocapture %varray) {
+ entry:
+diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll
+index 42c280df6ad02..088bbdcf1aa4a 100644
+--- a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll
++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll
+@@ -48,7 +48,7 @@ declare float @llvm.exp2.f32(float) #0
+ 
+ define void @sin_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @sin_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_sin4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -71,7 +71,7 @@ for.end:
+ 
+ define void @sin_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @sin_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_sinf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_sinf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -94,7 +94,7 @@ for.end:
+ 
+ define void @sin_f64_intrinsic(double* nocapture %varray) {
+ ; CHECK-LABEL: @sin_f64_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_sin4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -117,7 +117,7 @@ for.end:
+ 
+ define void @sin_f32_intrinsic(float* nocapture %varray) {
+ ; CHECK-LABEL: @sin_f32_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_sinf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_sinf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -140,7 +140,7 @@ for.end:
+ 
+ define void @cos_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @cos_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_cos4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -163,7 +163,7 @@ for.end:
+ 
+ define void @cos_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @cos_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_cosf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_cosf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -186,7 +186,7 @@ for.end:
+ 
+ define void @cos_f64_intrinsic(double* nocapture %varray) {
+ ; CHECK-LABEL: @cos_f64_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_cos4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -209,7 +209,7 @@ for.end:
+ 
+ define void @cos_f32_intrinsic(float* nocapture %varray) {
+ ; CHECK-LABEL: @cos_f32_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_cosf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_cosf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -232,7 +232,7 @@ for.end:
+ 
+ define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {
+ ; CHECK-LABEL: @pow_f64(
+-; CHECK:    [[TMP8:%.*]] = call <4 x double> @__svml_pow4(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]])
++; CHECK:    [[TMP8:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -257,7 +257,7 @@ for.end:
+ 
+ define void @pow_f64_intrinsic(double* nocapture %varray, double* nocapture readonly %exp) {
+ ; CHECK-LABEL: @pow_f64_intrinsic(
+-; CHECK:    [[TMP8:%.*]] = call <4 x double> @__svml_pow4(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]])
++; CHECK:    [[TMP8:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -282,7 +282,7 @@ for.end:
+ 
+ define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
+ ; CHECK-LABEL: @pow_f32(
+-; CHECK:    [[TMP8:%.*]] = call <4 x float> @__svml_powf4(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]])
++; CHECK:    [[TMP8:%.*]] = call intel_svmlcc128 <4 x float> @__svml_powf4_ha(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -307,7 +307,7 @@ for.end:
+ 
+ define void @pow_f32_intrinsic(float* nocapture %varray, float* nocapture readonly %exp) {
+ ; CHECK-LABEL: @pow_f32_intrinsic(
+-; CHECK:    [[TMP8:%.*]] = call <4 x float> @__svml_powf4(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]])
++; CHECK:    [[TMP8:%.*]] = call intel_svmlcc128 <4 x float> @__svml_powf4_ha(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -332,7 +332,7 @@ for.end:
+ 
+ define void @exp_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @exp_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_exp4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -355,7 +355,7 @@ for.end:
+ 
+ define void @exp_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @exp_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_expf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_expf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -378,7 +378,7 @@ for.end:
+ 
+ define void @exp_f64_intrinsic(double* nocapture %varray) {
+ ; CHECK-LABEL: @exp_f64_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_exp4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -401,7 +401,7 @@ for.end:
+ 
+ define void @exp_f32_intrinsic(float* nocapture %varray) {
+ ; CHECK-LABEL: @exp_f32_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_expf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_expf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -424,7 +424,7 @@ for.end:
+ 
+ define void @log_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @log_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_log4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -447,7 +447,7 @@ for.end:
+ 
+ define void @log_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @log_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_logf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_logf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -470,7 +470,7 @@ for.end:
+ 
+ define void @log_f64_intrinsic(double* nocapture %varray) {
+ ; CHECK-LABEL: @log_f64_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_log4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -493,7 +493,7 @@ for.end:
+ 
+ define void @log_f32_intrinsic(float* nocapture %varray) {
+ ; CHECK-LABEL: @log_f32_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_logf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_logf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -516,7 +516,7 @@ for.end:
+ 
+ define void @log2_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @log2_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_log24(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log24_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -539,7 +539,7 @@ for.end:
+ 
+ define void @log2_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @log2_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_log2f4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log2f4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -562,7 +562,7 @@ for.end:
+ 
+ define void @log2_f64_intrinsic(double* nocapture %varray) {
+ ; CHECK-LABEL: @log2_f64_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_log24(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log24_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -585,7 +585,7 @@ for.end:
+ 
+ define void @log2_f32_intrinsic(float* nocapture %varray) {
+ ; CHECK-LABEL: @log2_f32_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_log2f4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log2f4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -608,7 +608,7 @@ for.end:
+ 
+ define void @log10_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @log10_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_log104(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log104_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -631,7 +631,7 @@ for.end:
+ 
+ define void @log10_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @log10_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_log10f4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log10f4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -654,7 +654,7 @@ for.end:
+ 
+ define void @log10_f64_intrinsic(double* nocapture %varray) {
+ ; CHECK-LABEL: @log10_f64_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_log104(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log104_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -677,7 +677,7 @@ for.end:
+ 
+ define void @log10_f32_intrinsic(float* nocapture %varray) {
+ ; CHECK-LABEL: @log10_f32_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_log10f4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log10f4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -700,7 +700,7 @@ for.end:
+ 
+ define void @sqrt_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @sqrt_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_sqrt4(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sqrt4_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -723,7 +723,7 @@ for.end:
+ 
+ define void @sqrt_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @sqrt_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_sqrtf4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_sqrtf4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -746,7 +746,7 @@ for.end:
+ 
+ define void @exp2_f64(double* nocapture %varray) {
+ ; CHECK-LABEL: @exp2_f64(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_exp24(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp24_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -769,7 +769,7 @@ for.end:
+ 
+ define void @exp2_f32(float* nocapture %varray) {
+ ; CHECK-LABEL: @exp2_f32(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_exp2f4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -792,7 +792,7 @@ for.end:
+ 
+ define void @exp2_f64_intrinsic(double* nocapture %varray) {
+ ; CHECK-LABEL: @exp2_f64_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x double> @__svml_exp24(<4 x double> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp24_ha(<4 x double> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -815,7 +815,7 @@ for.end:
+ 
+ define void @exp2_f32_intrinsic(float* nocapture %varray) {
+ ; CHECK-LABEL: @exp2_f32_intrinsic(
+-; CHECK:    [[TMP5:%.*]] = call <4 x float> @__svml_exp2f4(<4 x float> [[TMP4:%.*]])
++; CHECK:    [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> [[TMP4:%.*]])
+ ; CHECK:    ret void
+ ;
+ entry:
+@@ -836,4 +836,44 @@ for.end:
+   ret void
+ }
+ 
++; CHECK-LABEL: @atan2_finite
++; CHECK: intel_svmlcc256 <4 x double> @__svml_atan24(
++; CHECK: intel_svmlcc256 <4 x double> @__svml_atan24(
++; CHECK: ret
++
++declare double @__atan2_finite(double, double) local_unnamed_addr #0
++
++define void @atan2_finite([100 x double]* nocapture %varray) local_unnamed_addr #0 {
++entry:
++  br label %for.cond1.preheader
++
++for.cond1.preheader:                              ; preds = %for.inc7, %entry
++  %indvars.iv19 = phi i64 [ 0, %entry ], [ %indvars.iv.next20, %for.inc7 ]
++  %0 = trunc i64 %indvars.iv19 to i32
++  %conv = sitofp i32 %0 to double
++  br label %for.body3
++
++for.body3:                                        ; preds = %for.body3, %for.cond1.preheader
++  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
++  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
++  %1 = trunc i64 %indvars.iv.next to i32
++  %conv4 = sitofp i32 %1 to double
++  %call = tail call fast double @__atan2_finite(double %conv, double %conv4)
++  %arrayidx6 = getelementptr inbounds [100 x double], [100 x double]* %varray, i64 %indvars.iv19, i64 %indvars.iv
++  store double %call, double* %arrayidx6, align 8
++  %exitcond = icmp eq i64 %indvars.iv.next, 100
++  br i1 %exitcond, label %for.inc7, label %for.body3, !llvm.loop !5
++
++for.inc7:                                         ; preds = %for.body3
++  %indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
++  %exitcond21 = icmp eq i64 %indvars.iv.next20, 100
++  br i1 %exitcond21, label %for.end9, label %for.cond1.preheader
++
++for.end9:                                         ; preds = %for.inc7
++  ret void
++}
++
+ attributes #0 = { nounwind readnone }
++!5 = distinct !{!5, !6, !7}
++!6 = !{!"llvm.loop.vectorize.width", i32 8}
++!7 = !{!"llvm.loop.vectorize.enable", i1 true}
+diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-calls.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-calls.ll
+new file mode 100644
+index 0000000000000..326c763994343
+--- /dev/null
++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-calls.ll
+@@ -0,0 +1,513 @@
++; Check legalization of SVML calls, including intrinsic versions (like @llvm.<fn_name>.<type>).
++
++; RUN: opt -vector-library=SVML -inject-tli-mappings -loop-vectorize -force-vector-width=8 -force-vector-interleave=1 -mattr=avx -S < %s | FileCheck %s
++
++target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
++target triple = "x86_64-unknown-linux-gnu"
++
++declare double @sin(double) #0
++declare float @sinf(float) #0
++declare double @llvm.sin.f64(double) #0
++declare float @llvm.sin.f32(float) #0
++
++declare double @cos(double) #0
++declare float @cosf(float) #0
++declare double @llvm.cos.f64(double) #0
++declare float @llvm.cos.f32(float) #0
++
++declare double @pow(double, double) #0
++declare float @powf(float, float) #0
++declare double @llvm.pow.f64(double, double) #0
++declare float @llvm.pow.f32(float, float) #0
++
++declare double @exp(double) #0
++declare float @expf(float) #0
++declare double @llvm.exp.f64(double) #0
++declare float @llvm.exp.f32(float) #0
++
++declare double @log(double) #0
++declare float @logf(float) #0
++declare double @llvm.log.f64(double) #0
++declare float @llvm.log.f32(float) #0
++
++
++define void @sin_f64(double* nocapture %varray) {
++; CHECK-LABEL: @sin_f64(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @sin(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @sin_f32(float* nocapture %varray) {
++; CHECK-LABEL: @sin_f32(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_sinf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @sinf(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @sin_f64_intrinsic(double* nocapture %varray) {
++; CHECK-LABEL: @sin_f64_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @llvm.sin.f64(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @sin_f32_intrinsic(float* nocapture %varray) {
++; CHECK-LABEL: @sin_f32_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_sinf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @llvm.sin.f32(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @cos_f64(double* nocapture %varray) {
++; CHECK-LABEL: @cos_f64(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @cos(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @cos_f32(float* nocapture %varray) {
++; CHECK-LABEL: @cos_f32(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_cosf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @cosf(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @cos_f64_intrinsic(double* nocapture %varray) {
++; CHECK-LABEL: @cos_f64_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @llvm.cos.f64(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @cos_f32_intrinsic(float* nocapture %varray) {
++; CHECK-LABEL: @cos_f32_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_cosf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @llvm.cos.f32(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {
++; CHECK-LABEL: @pow_f64(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP2:%.*]], <4 x double> [[TMP3:%.*]])
++; CHECK:    [[TMP4:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP5:%.*]], <4 x double> [[TMP6:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %arrayidx = getelementptr inbounds double, double* %exp, i64 %iv
++  %tmp1 = load double, double* %arrayidx, align 4
++  %tmp2 = tail call double @pow(double %conv, double %tmp1)
++  %arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %tmp2, double* %arrayidx2, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @pow_f64_intrinsic(double* nocapture %varray, double* nocapture readonly %exp) {
++; CHECK-LABEL: @pow_f64_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP2:%.*]], <4 x double> [[TMP3:%.*]])
++; CHECK:    [[TMP4:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP5:%.*]], <4 x double> [[TMP6:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %arrayidx = getelementptr inbounds double, double* %exp, i64 %iv
++  %tmp1 = load double, double* %arrayidx, align 4
++  %tmp2 = tail call double @llvm.pow.f64(double %conv, double %tmp1)
++  %arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %tmp2, double* %arrayidx2, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
++; CHECK-LABEL: @pow_f32(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_powf8_ha(<8 x float> [[TMP2:%.*]], <8 x float> [[WIDE_LOAD:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %arrayidx = getelementptr inbounds float, float* %exp, i64 %iv
++  %tmp1 = load float, float* %arrayidx, align 4
++  %tmp2 = tail call float @powf(float %conv, float %tmp1)
++  %arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %tmp2, float* %arrayidx2, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @pow_f32_intrinsic(float* nocapture %varray, float* nocapture readonly %exp) {
++; CHECK-LABEL: @pow_f32_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_powf8_ha(<8 x float> [[TMP2:%.*]], <8 x float> [[TMP3:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %arrayidx = getelementptr inbounds float, float* %exp, i64 %iv
++  %tmp1 = load float, float* %arrayidx, align 4
++  %tmp2 = tail call float @llvm.pow.f32(float %conv, float %tmp1)
++  %arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %tmp2, float* %arrayidx2, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @exp_f64(double* nocapture %varray) {
++; CHECK-LABEL: @exp_f64(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @exp(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @exp_f32(float* nocapture %varray) {
++; CHECK-LABEL: @exp_f32(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_expf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @expf(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @exp_f64_intrinsic(double* nocapture %varray) {
++; CHECK-LABEL: @exp_f64_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @llvm.exp.f64(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @exp_f32_intrinsic(float* nocapture %varray) {
++; CHECK-LABEL: @exp_f32_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_expf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @llvm.exp.f32(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @log_f64(double* nocapture %varray) {
++; CHECK-LABEL: @log_f64(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @log(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @log_f32(float* nocapture %varray) {
++; CHECK-LABEL: @log_f32(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_logf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @logf(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @log_f64_intrinsic(double* nocapture %varray) {
++; CHECK-LABEL: @log_f64_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP2:%.*]])
++; CHECK:    [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to double
++  %call = tail call double @llvm.log.f64(double %conv)
++  %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
++  store double %call, double* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++define void @log_f32_intrinsic(float* nocapture %varray) {
++; CHECK-LABEL: @log_f32_intrinsic(
++; CHECK:    [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_logf8_ha(<8 x float> [[TMP2:%.*]])
++; CHECK:    ret void
++;
++entry:
++  br label %for.body
++
++for.body:
++  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
++  %tmp = trunc i64 %iv to i32
++  %conv = sitofp i32 %tmp to float
++  %call = tail call float @llvm.log.f32(float %conv)
++  %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
++  store float %call, float* %arrayidx, align 4
++  %iv.next = add nuw nsw i64 %iv, 1
++  %exitcond = icmp eq i64 %iv.next, 1000
++  br i1 %exitcond, label %for.end, label %for.body
++
++for.end:
++  ret void
++}
++
++attributes #0 = { nounwind readnone }
++
+diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-codegen.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-codegen.ll
+new file mode 100644
+index 0000000000000..9422653445dc2
+--- /dev/null
++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-codegen.ll
+@@ -0,0 +1,61 @@
++; Check that vector codegen splits illegal sin8 call to two sin4 calls on AVX for double datatype.
++; The C code used to generate this test:
++
++; #include <math.h>
++;
++; void foo(double *a, int N){
++;   int i;
++; #pragma clang loop vectorize_width(8)
++;   for (i=0;i<N;i++){
++;     a[i] = sin(i);
++;   }
++; }
++
++; RUN: opt -vector-library=SVML -inject-tli-mappings -loop-vectorize -force-vector-width=8 -mattr=avx -S < %s | FileCheck %s
++
++; CHECK: [[I1:%.*]] = sitofp <8 x i32> [[I0:%.*]] to <8 x double>
++; CHECK-NEXT: [[S1:%shuffle.*]] = shufflevector <8 x double> [[I1]], <8 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
++; CHECK-NEXT: [[I2:%.*]] = call fast intel_svmlcc256 <4 x double> @__svml_sin4(<4 x double> [[S1]])
++; CHECK-NEXT: [[S2:%shuffle.*]] = shufflevector <8 x double> [[I1]], <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
++; CHECK-NEXT: [[I3:%.*]] = call fast intel_svmlcc256 <4 x double> @__svml_sin4(<4 x double> [[S2]])
++; CHECK-NEXT: [[comb:%combined.*]] = shufflevector <4 x double> [[I2]], <4 x double> [[I3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
++; CHECK: store <8 x double> [[comb]], <8 x double>* [[TMP:%.*]], align 8
++
++
++target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
++target triple = "x86_64-unknown-linux-gnu"
++
++; Function Attrs: nounwind uwtable
++define dso_local void @foo(double* nocapture %a, i32 %N) local_unnamed_addr #0 {
++entry:
++  %cmp5 = icmp sgt i32 %N, 0
++  br i1 %cmp5, label %for.body.preheader, label %for.end
++
++for.body.preheader:                               ; preds = %entry
++  %wide.trip.count = zext i32 %N to i64
++  br label %for.body
++
++for.body:                                         ; preds = %for.body, %for.body.preheader
++  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
++  %0 = trunc i64 %indvars.iv to i32
++  %conv = sitofp i32 %0 to double
++  %call = tail call fast double @sin(double %conv) #2
++  %arrayidx = getelementptr inbounds double, double* %a, i64 %indvars.iv
++  store double %call, double* %arrayidx, align 8, !tbaa !2
++  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
++  %exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
++  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6
++
++for.end:                                          ; preds = %for.body, %entry
++  ret void
++}
++
++; Function Attrs: nounwind
++declare dso_local double @sin(double) local_unnamed_addr #1
++
++!2 = !{!3, !3, i64 0}
++!3 = !{!"double", !4, i64 0}
++!4 = !{!"omnipotent char", !5, i64 0}
++!5 = !{!"Simple C/C++ TBAA"}
++!6 = distinct !{!6, !7}
++!7 = !{!"llvm.loop.vectorize.width", i32 8}
+diff --git a/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll b/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll
+index e8c83c4d9bd1f..615fdc29176a2 100644
+--- a/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll
++++ b/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll
+@@ -12,12 +12,12 @@ target triple = "x86_64-unknown-linux-gnu"
+ 
+ ; COMMON-LABEL: @llvm.compiler.used = appending global
+ ; SVML-SAME:        [6 x i8*] [
+-; SVML-SAME:          i8* bitcast (<2 x double> (<2 x double>)* @__svml_sin2 to i8*),
+-; SVML-SAME:          i8* bitcast (<4 x double> (<4 x double>)* @__svml_sin4 to i8*),
+-; SVML-SAME:          i8* bitcast (<8 x double> (<8 x double>)* @__svml_sin8 to i8*),
+-; SVML-SAME:          i8* bitcast (<4 x float> (<4 x float>)* @__svml_log10f4 to i8*),
+-; SVML-SAME:          i8* bitcast (<8 x float> (<8 x float>)* @__svml_log10f8 to i8*),
+-; SVML-SAME:          i8* bitcast (<16 x float> (<16 x float>)* @__svml_log10f16 to i8*)
++; SVML-SAME:          i8* bitcast (<2 x double> (<2 x double>)* @__svml_sin2_ha to i8*),
++; SVML-SAME:          i8* bitcast (<4 x double> (<4 x double>)* @__svml_sin4_ha to i8*),
++; SVML-SAME:          i8* bitcast (<8 x double> (<8 x double>)* @__svml_sin8_ha to i8*),
++; SVML-SAME:          i8* bitcast (<4 x float> (<4 x float>)* @__svml_log10f4_ha to i8*),
++; SVML-SAME:          i8* bitcast (<8 x float> (<8 x float>)* @__svml_log10f8_ha to i8*),
++; SVML-SAME:          i8* bitcast (<16 x float> (<16 x float>)* @__svml_log10f16_ha to i8*)
+ ; MASSV-SAME:       [2 x i8*] [
+ ; MASSV-SAME:         i8* bitcast (<2 x double> (<2 x double>)* @__sind2 to i8*),
+ ; MASSV-SAME:         i8* bitcast (<4 x float> (<4 x float>)* @__log10f4 to i8*)
+@@ -59,9 +59,9 @@ declare float @llvm.log10.f32(float) #0
+ attributes #0 = { nounwind readnone }
+ 
+ ; SVML:      attributes #[[SIN]] = { "vector-function-abi-variant"=
+-; SVML-SAME:   "_ZGV_LLVM_N2v_sin(__svml_sin2),
+-; SVML-SAME:   _ZGV_LLVM_N4v_sin(__svml_sin4),
+-; SVML-SAME:   _ZGV_LLVM_N8v_sin(__svml_sin8)" }
++; SVML-SAME:   "_ZGV_LLVM_N2v_sin(__svml_sin2_ha),
++; SVML-SAME:   _ZGV_LLVM_N4v_sin(__svml_sin4_ha),
++; SVML-SAME:   _ZGV_LLVM_N8v_sin(__svml_sin8_ha)" }
+ 
+ ; MASSV:      attributes #[[SIN]] = { "vector-function-abi-variant"=
+ ; MASSV-SAME:   "_ZGV_LLVM_N2v_sin(__sind2)" }
+diff --git a/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt b/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt
+index 97df6a55d1b59..199e0285c9e5d 100644
+--- a/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt
++++ b/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt
+@@ -47,6 +47,7 @@ add_tablegen(llvm-tblgen LLVM
+   SearchableTableEmitter.cpp
+   SubtargetEmitter.cpp
+   SubtargetFeatureInfo.cpp
++  SVMLEmitter.cpp
+   TableGen.cpp
+   Types.cpp
+   X86DisassemblerTables.cpp
+diff --git a/llvm-14.0.6.src/utils/TableGen/SVMLEmitter.cpp b/llvm-14.0.6.src/utils/TableGen/SVMLEmitter.cpp
+new file mode 100644
+index 0000000000000..a5aeea48db28b
+--- /dev/null
++++ b/llvm-14.0.6.src/utils/TableGen/SVMLEmitter.cpp
+@@ -0,0 +1,110 @@
++//===------ SVMLEmitter.cpp - Generate SVML function variants -------------===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++//
++// This tablegen backend emits the scalar to svml function map for TLI.
++//
++//===----------------------------------------------------------------------===//
++
++#include "CodeGenTarget.h"
++#include "llvm/Support/Format.h"
++#include "llvm/TableGen/Error.h"
++#include "llvm/TableGen/Record.h"
++#include "llvm/TableGen/TableGenBackend.h"
++#include <map>
++#include <vector>
++
++using namespace llvm;
++
++#define DEBUG_TYPE "SVMLVariants"
++#include "llvm/Support/Debug.h"
++
++namespace {
++
++class SVMLVariantsEmitter {
++
++  RecordKeeper &Records;
++
++private:
++  void emitSVMLVariants(raw_ostream &OS);
++
++public:
++  SVMLVariantsEmitter(RecordKeeper &R) : Records(R) {}
++
++  void run(raw_ostream &OS);
++};
++} // End anonymous namespace
++
++/// \brief Emit the set of SVML variant function names.
++// The default is to emit the high accuracy SVML variants until a mechanism is
++// introduced to allow a selection of different variants through precision
++// requirements specified by the user. This code generates mappings to svml
++// that are in the scalar form of llvm intrinsics, math library calls, or the
++// finite variants of math library calls.
++void SVMLVariantsEmitter::emitSVMLVariants(raw_ostream &OS) {
++
++  const unsigned MinSinglePrecVL = 4;
++  const unsigned MaxSinglePrecVL = 16;
++  const unsigned MinDoublePrecVL = 2;
++  const unsigned MaxDoublePrecVL = 8;
++
++  OS << "#ifdef GET_SVML_VARIANTS\n";
++
++  for (const auto &D : Records.getAllDerivedDefinitions("SvmlVariant")) {
++    StringRef SvmlVariantNameStr = D->getName();
++    // Single Precision SVML
++    for (unsigned VL = MinSinglePrecVL; VL <= MaxSinglePrecVL; VL *= 2) {
++      // Emit the scalar math library function to svml function entry.
++      OS << "{\"" << SvmlVariantNameStr << "f" << "\", ";
++      OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", "
++         << "ElementCount::getFixed(" << VL << ")},\n";
++
++      // Emit the scalar intrinsic to svml function entry.
++      OS << "{\"" << "llvm." << SvmlVariantNameStr << ".f32" << "\", ";
++      OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", "
++         << "ElementCount::getFixed(" << VL << ")},\n";
++
++      // Emit the finite math library function to svml function entry.
++      OS << "{\"__" << SvmlVariantNameStr << "f_finite" << "\", ";
++      OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", "
++         << "ElementCount::getFixed(" << VL << ")},\n";
++    }
++
++    // Double Precision SVML
++    for (unsigned VL = MinDoublePrecVL; VL <= MaxDoublePrecVL; VL *= 2) {
++      // Emit the scalar math library function to svml function entry.
++      OS << "{\"" << SvmlVariantNameStr << "\", ";
++      OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", " << "ElementCount::getFixed(" << VL
++         << ")},\n";
++
++      // Emit the scalar intrinsic to svml function entry.
++      OS << "{\"" << "llvm." << SvmlVariantNameStr << ".f64" << "\", ";
++      OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", " << "ElementCount::getFixed(" << VL
++         << ")},\n";
++
++      // Emit the finite math library function to svml function entry.
++      OS << "{\"__" << SvmlVariantNameStr << "_finite" << "\", ";
++      OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", "
++         << "ElementCount::getFixed(" << VL << ")},\n";
++    }
++  }
++
++  OS << "#endif // GET_SVML_VARIANTS\n\n";
++}
++
++void SVMLVariantsEmitter::run(raw_ostream &OS) {
++  emitSVMLVariants(OS);
++}
++
++namespace llvm {
++
++void EmitSVMLVariants(RecordKeeper &RK, raw_ostream &OS) {
++  SVMLVariantsEmitter(RK).run(OS);
++}
++
++} // End llvm namespace
+diff --git a/llvm-14.0.6.src/utils/TableGen/TableGen.cpp b/llvm-14.0.6.src/utils/TableGen/TableGen.cpp
+index 2d4a45f889be6..603d0c223b33a 100644
+--- a/llvm-14.0.6.src/utils/TableGen/TableGen.cpp
++++ b/llvm-14.0.6.src/utils/TableGen/TableGen.cpp
+@@ -57,6 +57,7 @@ enum ActionType {
+   GenAutomata,
+   GenDirectivesEnumDecl,
+   GenDirectivesEnumImpl,
++  GenSVMLVariants,
+ };
+ 
+ namespace llvm {
+@@ -138,7 +139,9 @@ cl::opt<ActionType> Action(
+         clEnumValN(GenDirectivesEnumDecl, "gen-directive-decl",
+                    "Generate directive related declaration code (header file)"),
+         clEnumValN(GenDirectivesEnumImpl, "gen-directive-impl",
+-                   "Generate directive related implementation code")));
++                   "Generate directive related implementation code"),
++        clEnumValN(GenSVMLVariants, "gen-svml",
++                   "Generate SVML variant function names")));
+ 
+ cl::OptionCategory PrintEnumsCat("Options for -print-enums");
+ cl::opt<std::string> Class("class", cl::desc("Print Enum list for this class"),
+@@ -272,6 +275,9 @@ bool LLVMTableGenMain(raw_ostream &OS, RecordKeeper &Records) {
+   case GenDirectivesEnumImpl:
+     EmitDirectivesImpl(Records, OS);
+     break;
++  case GenSVMLVariants:
++    EmitSVMLVariants(Records, OS);
++    break;
+   }
+ 
+   return false;
+diff --git a/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h b/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h
+index 71db8dc77b052..86c3a3068c2dc 100644
+--- a/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h
++++ b/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h
+@@ -93,6 +93,7 @@ void EmitExegesis(RecordKeeper &RK, raw_ostream &OS);
+ void EmitAutomata(RecordKeeper &RK, raw_ostream &OS);
+ void EmitDirectivesDecl(RecordKeeper &RK, raw_ostream &OS);
+ void EmitDirectivesImpl(RecordKeeper &RK, raw_ostream &OS);
++void EmitSVMLVariants(RecordKeeper &RK, raw_ostream &OS);
+ 
+ } // End llvm namespace
+ 
+diff --git a/llvm-14.0.6.src/utils/vim/syntax/llvm.vim b/llvm-14.0.6.src/utils/vim/syntax/llvm.vim
+index 205db16b7d8cd..2572ab5a59e1b 100644
+--- a/llvm-14.0.6.src/utils/vim/syntax/llvm.vim
++++ b/llvm-14.0.6.src/utils/vim/syntax/llvm.vim
+@@ -104,6 +104,7 @@ syn keyword llvmKeyword
+       \ inreg
+       \ intel_ocl_bicc
+       \ inteldialect
++      \ intel_svmlcc
+       \ internal
+       \ jumptable
+       \ linkonce
diff --git a/py-llvmlite/log b/py-llvmlite/log
new file mode 100644
index 0000000000..6592c65247
--- /dev/null
+++ b/py-llvmlite/log
@@ -0,0 +1,44 @@
+v0.43.0 (June 13, 2024)
+-----------------------
+
+Highlights of this release are:
+
+- Support for building against LLVM 15.
+- A fix for `refpruning` algorithm in specific `fanout_raise` cases.
+
+Pull-Requests:
+
+* PR `#1025`_: skip `raise` basic blocks in `verifyFanoutBackward`
+* PR `#1029`_: Update CHANGE_LOG for 0.42.0 final.
+* PR `#1032`_: v0.42 Post release 
+* PR `#1035`_:  Support building against llvm15
+* PR `#1059`_: update CHANGE_LOG and release date for 0.43.0 final
+
+v0.42.0 (January 31, 2024)
+--------------------------
+
+Highlights of this release include:
+
+- Support for Python 3.12.
+- A fix for relocation overflows on AArch64 systems.
+- Binding layer: new queries for incoming blocks of phi instructions, type
+  kinds, and elements. Addition of the Instruction Namer pass.
+- IR layer: Support `convergent` as an attribute of function calls and call
+  instructions.
+
+Pull-Requests:
+
+* PR `#973`_: Bindings: Query incoming blocks of a phi instruction
+* PR `#978`_: Bindings: Query type kinds, derived types, and elements
+* PR `#981`_: Add Instruction Namer pass to PassManager
+* PR `#993`_: Update changelog on main for 0.41.0
+* PR `#1005`_: Remove suggestion that add_global_mapping() is unused
+* PR `#1006`_: Release Notes 0.41.1 for main
+* PR `#1007`_: update release checklists post 0.41.1
+* PR `#1009`_: Fix relocation overflows by implementing preallocation in the memory manager
+* PR `#1010`_: Python 3.12
+* PR `#1012`_: conda-recipe cleanups
+* PR `#1014`_: Fix conda-recipe syntax errors from #1012
+* PR `#1017`_: add 3.12 to azure
+* PR `#1018`_: Bump minimum supported Python version to 3.9
+* PR `#1019`_: Add convergent as a supported FunctionAttribute and CallInstrAttribute.
diff --git a/py-llvmlite/patches/patch-ffi_build.py b/py-llvmlite/patches/patch-ffi_build.py
new file mode 100644
index 0000000000..3dc05e9d95
--- /dev/null
+++ b/py-llvmlite/patches/patch-ffi_build.py
@@ -0,0 +1,16 @@
+$NetBSD: patch-ffi_build.py,v 1.7 2020/05/12 08:08:08 adam Exp $
+
+Add NetBSD support.
+https://github.com/numba/llvmlite/pull/1074
+
+--- ffi/build.py.orig	2020-05-08 14:22:24.000000000 +0000
++++ ffi/build.py
+@@ -182,6 +182,8 @@ def main():
+         main_posix('linux', '.so')
+     elif sys.platform.startswith(('freebsd','openbsd')):
+         main_posix('freebsd', '.so')
++    elif sys.platform.startswith('netbsd'):
++        main_posix('netbsd', '.so')
+     elif sys.platform == 'darwin':
+         main_posix('osx', '.dylib')
+     else:


Home | Main Index | Thread Index | Old Index