pkgsrc-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[pkgsrc/trunk]: pkgsrc/net/youtube-dl youtube-dl: Update to 20201118



details:   https://anonhg.NetBSD.org/pkgsrc/rev/61e50b50875c
branches:  trunk
changeset: 442095:61e50b50875c
user:      leot <leot%pkgsrc.org@localhost>
date:      Wed Nov 18 17:35:15 2020 +0000

description:
youtube-dl: Update to 20201118

pkgsrc changes:
 - Remove patch-youtube__dl_extractor_bandcamp.py, fixed differently upstream
 - Update patch-youtube__dl_extractor_rai.py to current rai extractor
 - Add a reference to upstream pull request in patch-youtube__dl_extractor_la7.py

Changes:
2020.11.18
----------
Extractors
* [spiegel] Fix extraction (#24206, #24767)
* [youtube] Improve extraction
    + Add support for --no-playlist (#27009)
    * Improve playlist and mix extraction (#26390, #26509, #26534, #27011)
    + Extract playlist uploader data
* [youtube:tab] Fix view count extraction (#27051)
* [malltv] Fix extraction (#27035)
+ [bandcamp] Extract playlist description (#22684)
* [urplay] Fix extraction (#26828)
* [youtube:tab] Fix playlist title extraction (#27015)
* [youtube] Fix chapters extraction (#26005)


2020.11.17
----------
Core
* [utils] Skip ! prefixed code in js_to_json

Extractors
* [youtube:tab] Fix extraction with cookies provided (#27005)
* [lrt] Fix extraction with empty tags (#20264)
+ [ndr:embed:base] Extract subtitles (#25447, #26106)
+ [servus] Add support for pm-wissen.com (#25869)
* [servus] Fix extraction (#26872, #26967, #26983, #27000)
* [xtube] Fix extraction (#26996)
* [lrt] Fix extraction
+ [lbry] Add support for lbry.tv
+ [condenast] Extract subtitles
* [condenast] Fix extraction
* [bandcamp] Fix extraction (#26681, #26684)
* [rai] Fix RaiPlay extraction (#26064, #26096)
* [vlive] Fix extraction
* [usanetwork] Fix extraction
* [nbc] Fix NBCNews/Today/MSNBC extraction
* [cnbc] Fix extraction

diffstat:

 net/youtube-dl/Makefile                                        |    4 +-
 net/youtube-dl/PLIST                                           |    8 +-
 net/youtube-dl/distinfo                                        |   15 +-
 net/youtube-dl/patches/patch-youtube__dl_extractor_bandcamp.py |  223 ----------
 net/youtube-dl/patches/patch-youtube__dl_extractor_la7.py      |   15 +-
 net/youtube-dl/patches/patch-youtube__dl_extractor_rai.py      |  114 +---
 6 files changed, 58 insertions(+), 321 deletions(-)

diffs (truncated from 573 to 300 lines):

diff -r e333d5aba248 -r 61e50b50875c net/youtube-dl/Makefile
--- a/net/youtube-dl/Makefile   Wed Nov 18 15:19:26 2020 +0000
+++ b/net/youtube-dl/Makefile   Wed Nov 18 17:35:15 2020 +0000
@@ -1,8 +1,8 @@
-# $NetBSD: Makefile,v 1.218 2020/11/12 14:41:38 leot Exp $
+# $NetBSD: Makefile,v 1.219 2020/11/18 17:35:15 leot Exp $
 
 # XXX: VERSION_DATE can contains also an optional part that indicates
 # XXX: possible same day revisions. PKGNAME preserves that dotted part as is.
-VERSION_DATE=  2020.11.12
+VERSION_DATE=  2020.11.18
 DISTNAME=      youtube-dl-${VERSION_DATE}
 PKGNAME=       ${DISTNAME:S/.//:S/.//}
 CATEGORIES=    net
diff -r e333d5aba248 -r 61e50b50875c net/youtube-dl/PLIST
--- a/net/youtube-dl/PLIST      Wed Nov 18 15:19:26 2020 +0000
+++ b/net/youtube-dl/PLIST      Wed Nov 18 17:35:15 2020 +0000
@@ -1,4 +1,4 @@
-@comment $NetBSD: PLIST,v 1.99 2020/02/16 19:28:47 leot Exp $
+@comment $NetBSD: PLIST,v 1.100 2020/11/18 17:35:15 leot Exp $
 bin/youtube-dl
 ${PYSITELIB}/${EGG_INFODIR}/PKG-INFO
 ${PYSITELIB}/${EGG_INFODIR}/SOURCES.txt
@@ -968,6 +968,9 @@
 ${PYSITELIB}/youtube_dl/extractor/laola1tv.py
 ${PYSITELIB}/youtube_dl/extractor/laola1tv.pyc
 ${PYSITELIB}/youtube_dl/extractor/laola1tv.pyo
+${PYSITELIB}/youtube_dl/extractor/lbry.py
+${PYSITELIB}/youtube_dl/extractor/lbry.pyc
+${PYSITELIB}/youtube_dl/extractor/lbry.pyo
 ${PYSITELIB}/youtube_dl/extractor/lci.py
 ${PYSITELIB}/youtube_dl/extractor/lci.pyc
 ${PYSITELIB}/youtube_dl/extractor/lci.pyo
@@ -1709,9 +1712,6 @@
 ${PYSITELIB}/youtube_dl/extractor/spiegel.py
 ${PYSITELIB}/youtube_dl/extractor/spiegel.pyc
 ${PYSITELIB}/youtube_dl/extractor/spiegel.pyo
-${PYSITELIB}/youtube_dl/extractor/spiegeltv.py
-${PYSITELIB}/youtube_dl/extractor/spiegeltv.pyc
-${PYSITELIB}/youtube_dl/extractor/spiegeltv.pyo
 ${PYSITELIB}/youtube_dl/extractor/spike.py
 ${PYSITELIB}/youtube_dl/extractor/spike.pyc
 ${PYSITELIB}/youtube_dl/extractor/spike.pyo
diff -r e333d5aba248 -r 61e50b50875c net/youtube-dl/distinfo
--- a/net/youtube-dl/distinfo   Wed Nov 18 15:19:26 2020 +0000
+++ b/net/youtube-dl/distinfo   Wed Nov 18 17:35:15 2020 +0000
@@ -1,11 +1,10 @@
-$NetBSD: distinfo,v 1.200 2020/11/12 14:41:38 leot Exp $
+$NetBSD: distinfo,v 1.201 2020/11/18 17:35:15 leot Exp $
 
-SHA1 (youtube-dl-2020.11.12.tar.gz) = 04e72d0b0a0e85b79a6c2ac93b7c85254b95b53b
-RMD160 (youtube-dl-2020.11.12.tar.gz) = 2afd73b5c09463951086b29298489f0d203a2207
-SHA512 (youtube-dl-2020.11.12.tar.gz) = 7db373f6cc252635a3613ffe0b3b10640e262778105ebbd78b837fe019b0a2609032d2aeb81b239e000a86220aff99d2c018a9a6325adad6981a8ab64048131c
-Size (youtube-dl-2020.11.12.tar.gz) = 3188015 bytes
+SHA1 (youtube-dl-2020.11.18.tar.gz) = e1b922ebc543f35ea7ee3de7e28e8deea7b97914
+RMD160 (youtube-dl-2020.11.18.tar.gz) = e526d2c4f297390cba92ae91d70223e9be3171e6
+SHA512 (youtube-dl-2020.11.18.tar.gz) = 110de857759b4c4bd0160242adebb3d8690bda2203a28a7b1a2ac1cdd9bca058702fd0b323010629e74bbb2df38f50c67b710bc2a6ad4cc907827ee013d0dbcf
+Size (youtube-dl-2020.11.18.tar.gz) = 3186065 bytes
 SHA1 (patch-setup.py) = a67074ae7cfe5e77847c2f610337ea553eddb69b
-SHA1 (patch-youtube__dl_extractor_bandcamp.py) = 81855a3f4f8c03f61fe543eb339c0e67bf52682e
-SHA1 (patch-youtube__dl_extractor_la7.py) = e246750808305343227060acdc5a38583ef071e9
-SHA1 (patch-youtube__dl_extractor_rai.py) = 3dbad7852b38e7364a248a5c9851c50cd2ff9b38
+SHA1 (patch-youtube__dl_extractor_la7.py) = 6c579f96e7ace1b64ef25fe8788b40bc4e7e67dd
+SHA1 (patch-youtube__dl_extractor_rai.py) = 5ec18da74c46f2195fe814d61ca044df4b70cc45
 SHA1 (patch-youtube__dl_postprocessor_ffmpeg.py) = f96676170a448d9205d542a7def4beca615a1490
diff -r e333d5aba248 -r 61e50b50875c net/youtube-dl/patches/patch-youtube__dl_extractor_bandcamp.py
--- a/net/youtube-dl/patches/patch-youtube__dl_extractor_bandcamp.py    Wed Nov 18 15:19:26 2020 +0000
+++ /dev/null   Thu Jan 01 00:00:00 1970 +0000
@@ -1,223 +0,0 @@
-$NetBSD: patch-youtube__dl_extractor_bandcamp.py,v 1.1 2020/11/01 10:58:24 leot Exp $
-
-[bandcamp] Update to handle HTML quoted data
-
-Adjust the extractor to handle JSON data-* attributes by introducing a
-_json_data_extract() method to handle them (and existing existing
-patterns in the code).
-
-Based on Gilles Pietri #26684.
-
---- youtube_dl/extractor/bandcamp.py.orig      2020-09-20 05:29:46.000000000 +0000
-+++ youtube_dl/extractor/bandcamp.py
-@@ -35,12 +35,15 @@ class BandcampIE(InfoExtractor):
-             'ext': 'mp3',
-             'title': "youtube-dl  \"'/\\\u00e4\u21ad - youtube-dl test song \"'/\\\u00e4\u21ad",
-             'duration': 9.8485,
-+            'uploader': "youtube-dl  \"'/\\\u00e4\u21ad",
-+            'timestamp': 1354224127,
-+            'upload_date': '20121129',
-         },
-         '_skip': 'There is a limit of 200 free downloads / month for the test song'
-     }, {
-         # free download
-         'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
--        'md5': '853e35bf34aa1d6fe2615ae612564b36',
-+        'md5': '149170678c0a81a009c69566bf42920a',
-         'info_dict': {
-             'id': '2650410135',
-             'ext': 'aiff',
-@@ -79,6 +82,14 @@ class BandcampIE(InfoExtractor):
-         },
-     }]
- 
-+    def _json_data_extract(self, data_key, video_id, webpage):
-+        return self._parse_json(
-+            self._search_regex(
-+                r'data-' + data_key + r'=(["\'])(?P<data>{.+?})\1',
-+                webpage, 'JSON data {data_key}'.format(data_key=data_key),
-+                group='data', default=None),
-+            video_id, transform_source=unescapeHTML)
-+
-     def _real_extract(self, url):
-         mobj = re.match(self._VALID_URL, url)
-         title = mobj.group('title')
-@@ -91,10 +102,9 @@ class BandcampIE(InfoExtractor):
-         duration = None
- 
-         formats = []
--        track_info = self._parse_json(
--            self._search_regex(
--                r'trackinfo\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
--                webpage, 'track info', default='{}'), title)
-+        tralbum_data = self._json_data_extract('tralbum', title, webpage)
-+        embed_data = self._json_data_extract('embed', title, webpage)
-+        track_info = tralbum_data['trackinfo'][0]
-         if track_info:
-             file_ = track_info.get('file')
-             if isinstance(file_, dict):
-@@ -110,38 +120,28 @@ class BandcampIE(InfoExtractor):
-                         'acodec': ext,
-                         'abr': int_or_none(abr_str),
-                     })
--            track = track_info.get('title')
-             track_id = str_or_none(track_info.get('track_id') or track_info.get('id'))
-             track_number = int_or_none(track_info.get('track_num'))
-             duration = float_or_none(track_info.get('duration'))
- 
-         def extract(key):
--            return self._search_regex(
--                r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key,
--                webpage, key, default=None, group='value')
-+            for data in tralbum_data['current'], embed_data, tralbum_data:
-+                if key in data and data[key]:
-+                    return data[key]
- 
-         artist = extract('artist')
-+        track = extract('title')
-         album = extract('album_title')
-         timestamp = unified_timestamp(
-             extract('publish_date') or extract('album_publish_date'))
-         release_date = unified_strdate(extract('album_release_date'))
- 
--        download_link = self._search_regex(
--            r'freeDownloadPage\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
--            'download link', default=None, group='url')
-+        download_link = tralbum_data['freeDownloadPage']
-         if download_link:
--            track_id = self._search_regex(
--                r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
--                webpage, 'track id')
--
-             download_webpage = self._download_webpage(
-                 download_link, track_id, 'Downloading free downloads page')
- 
--            blob = self._parse_json(
--                self._search_regex(
--                    r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
--                    'blob', group='blob'),
--                track_id, transform_source=unescapeHTML)
-+            blob = self._json_data_extract('blob', track_id, download_webpage)
- 
-             info = try_get(
-                 blob, (lambda x: x['digital_items'][0],
-@@ -218,7 +218,7 @@ class BandcampIE(InfoExtractor):
-         }
- 
- 
--class BandcampAlbumIE(InfoExtractor):
-+class BandcampAlbumIE(BandcampIE):
-     IE_NAME = 'Bandcamp:album'
-     _VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^/?#&]+))?'
- 
-@@ -299,26 +299,23 @@ class BandcampAlbumIE(InfoExtractor):
-         album_id = mobj.group('album_id')
-         playlist_id = album_id or uploader_id
-         webpage = self._download_webpage(url, playlist_id)
--        track_elements = re.findall(
--            r'(?s)<div[^>]*>(.*?<a[^>]+href="([^"]+?)"[^>]+itemprop="url"[^>]*>.*?)</div>', webpage)
-+
-+        tralbum_data = self._json_data_extract('tralbum', album_id, webpage)
-+        embed_data = self._json_data_extract('embed', album_id, webpage)
-+        title = embed_data.get('album_title')
-+
-+        track_elements = tralbum_data['trackinfo']
-         if not track_elements:
-             raise ExtractorError('The page doesn\'t contain any tracks')
-         # Only tracks with duration info have songs
-         entries = [
-             self.url_result(
--                compat_urlparse.urljoin(url, t_path),
-+                compat_urlparse.urljoin(url, t['title_link']),
-                 ie=BandcampIE.ie_key(),
--                video_title=self._search_regex(
--                    r'<span\b[^>]+\bitemprop=["\']name["\'][^>]*>([^<]+)',
--                    elem_content, 'track title', fatal=False))
--            for elem_content, t_path in track_elements
--            if self._html_search_meta('duration', elem_content, default=None)]
--
--        title = self._html_search_regex(
--            r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
--            webpage, 'title', fatal=False)
--        if title:
--            title = title.replace(r'\"', '"')
-+                video_title=t['title'])
-+            for t in track_elements
-+            if t['duration']]
-+
-         return {
-             '_type': 'playlist',
-             'uploader_id': uploader_id,
-@@ -328,22 +325,21 @@ class BandcampAlbumIE(InfoExtractor):
-         }
- 
- 
--class BandcampWeeklyIE(InfoExtractor):
-+class BandcampWeeklyIE(BandcampIE):
-     IE_NAME = 'Bandcamp:weekly'
-     _VALID_URL = r'https?://(?:www\.)?bandcamp\.com/?\?(?:.*?&)?show=(?P<id>\d+)'
-     _TESTS = [{
-         'url': 'https://bandcamp.com/?show=224',
--        'md5': 'b00df799c733cf7e0c567ed187dea0fd',
-+        'md5': '61acc9a002bed93986b91168aa3ab433',
-         'info_dict': {
-             'id': '224',
--            'ext': 'opus',
-+            'ext': 'mp3',
-             'title': 'BC Weekly April 4th 2017 - Magic Moments',
-             'description': 'md5:5d48150916e8e02d030623a48512c874',
-             'duration': 5829.77,
-             'release_date': '20170404',
-             'series': 'Bandcamp Weekly',
-             'episode': 'Magic Moments',
--            'episode_number': 208,
-             'episode_id': '224',
-         }
-     }, {
-@@ -355,13 +351,13 @@ class BandcampWeeklyIE(InfoExtractor):
-         video_id = self._match_id(url)
-         webpage = self._download_webpage(url, video_id)
- 
--        blob = self._parse_json(
--            self._search_regex(
--                r'data-blob=(["\'])(?P<blob>{.+?})\1', webpage,
--                'blob', group='blob'),
--            video_id, transform_source=unescapeHTML)
-+        blob = self._json_data_extract('blob', video_id, webpage)
- 
--        show = blob['bcw_show']
-+        show = None
-+        for bd in blob['bcw_data']:
-+            if blob['bcw_data'][bd].get('expanded'):
-+                show = blob['bcw_data'][bd]
-+                break
- 
-         # This is desired because any invalid show id redirects to `bandcamp.com`
-         # which happens to expose the latest Bandcamp Weekly episode.
-@@ -390,18 +386,6 @@ class BandcampWeeklyIE(InfoExtractor):
-         if subtitle:
-             title += ' - %s' % subtitle
- 
--        episode_number = None
--        seq = blob.get('bcw_seq')
--
--        if seq and isinstance(seq, list):
--            try:
--                episode_number = next(
--                    int_or_none(e.get('episode_number'))
--                    for e in seq
--                    if isinstance(e, dict) and int_or_none(e.get('id')) == show_id)
--            except StopIteration:
--                pass
--
-         return {
-             'id': video_id,
-             'title': title,
-@@ -411,7 +395,6 @@ class BandcampWeeklyIE(InfoExtractor):
-             'release_date': unified_strdate(show.get('published_date')),
-             'series': 'Bandcamp Weekly',
-             'episode': show.get('subtitle'),
--            'episode_number': episode_number,
-             'episode_id': compat_str(video_id),
-             'formats': formats
-         }
diff -r e333d5aba248 -r 61e50b50875c net/youtube-dl/patches/patch-youtube__dl_extractor_la7.py
--- a/net/youtube-dl/patches/patch-youtube__dl_extractor_la7.py Wed Nov 18 15:19:26 2020 +0000
+++ b/net/youtube-dl/patches/patch-youtube__dl_extractor_la7.py Wed Nov 18 17:35:15 2020 +0000
@@ -1,4 +1,4 @@
-$NetBSD: patch-youtube__dl_extractor_la7.py,v 1.2 2020/03/23 20:32:23 leot Exp $
+$NetBSD: patch-youtube__dl_extractor_la7.py,v 1.3 2020/11/18 17:35:15 leot Exp $
 
 [la7] Fix extraction (closes #23323)



Home | Main Index | Thread Index | Old Index