Negative Lookbehinds - Fortune

Negative Lookbehinds

GordonFreeman hi
rindolf Hi GordonFreeman
GordonFreeman grep -Po '(?<=<a )(?<! href=)(?<= href=["]*)[^">]+' <<< '<a gfasg href=asdf>'
GordonFreeman grep: lookbehind assertion is not fixed length
rindolf GordonFreeman: grep is PCRE - it's not Perl.
rindolf perlbot: pcre
Altreus GordonFreeman: don't use regex for HTML
perlbot rindolf: PCRE is not Perl. It lacks several features of Perl regexes. Don't bother asking for help with a PCRE pattern in a Perl channel as the answers will not be relevant. Try #regex, or the channel for your language. See also http://en.wikipedia.org/wiki/PCRE#Differences_from_Perl and LPBD.
GordonFreeman but this should work i think.
mauke no, it shouldn't
GordonFreeman though it fails at the second lookbehind ...
mauke no, it doesn't
GordonFreeman and fails at "* too
GordonFreeman (grep -Po '<a +.* +href="*[^" >]+' | grep -Po '(?=<a ).*' | grep -Po '(?<= href=)["]*[^" >]+') <<< '<a gfasg href=asdf><a fgfgg="hi> " href="link" >'
GordonFreeman this works.
mauke GordonFreeman: dude.
anno don't paste!
GordonFreeman hi mauke
apeiron where's mauke's car?
rindolf apeiron: :-)
mauke it's a cdr
Altreus I watched that the other day
rindolf pkrumins: what's up?
Altreus I don't really know why
mauke GordonFreeman: go to a channel where that is on-topic
GordonFreeman mauke<< like?
mauke no idea
Altreus where on earth is parsing HTML with regexes on topic?
GordonFreeman aham ok
Altreus except ##php lolol
GordonFreeman well i think one can see its logical and it works like this
rindolf GordonFreeman: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
shorten rindolf's url is at http://xrl.us/bf4jh6
apeiron GordonFreeman, also, -P isn't perl.
thrig Altreus: some special level of hell, between the angry ghosts and the hungry ghosts
rindolf perlbot: html
apeiron the grep docs lie to you.
perlbot rindolf: Don't parse or modify html with regular expressions! See one of HTML::Parser's subclasses: HTML::TokeParser, HTML::TokeParser::Simple, HTML::TreeBuilder(::Xpath)?, HTML::TableExtract etc. If your response begins "that's overkill. i only want to..." you are wrong. http://en.wikipedia.org/wiki/Chomsky_hierarchy and http://xrl.us/bf4jh6 for why not to use regex on HTML
LeoNerd Altreus: Why, surely in #html-parsing-by-regexp
Altreus if you want perl regex use ack
Altreus surely
rindolf LeoNerd: sounds like programmers' hell.
anno perl regex doesn't support variable-length lookbehind either
Altreus apeiron: actually it says it's highly experimental and hence not working
Altreus it could well be Perl and not PCRE when finished :)
Altreus not that "perl regex" is a defined term, the speed Perl is moving
yrlnry That's why you should never use Perl's builtin regexes. Just write your own package, it's sure to be more reliable.
rindolf yrlnry: :-)
talexb Heh.
LeoNerd use re::engine::vim;
rindolf yrlnry++
Altreus LeoNerd: is it core?
yrlnry HOP has a nice implementation. It works by generating a list of every string matched by the regex, and looking to see if your target string is in the list.
LeoNerd I can't help thinking that may not be optimal in terms of CPU or memory usage
talexb yrlnry, no doubt they have a Cray working on generating the list ..
yrlnry LeoNerd: Depends; unlike Perl regexes, it has no trouble handling languages higher up the Chomsky hierarchy
yrlnry It is guaranteed to return the right answer for any recursive language, and guaranteed to return correct 'matched' answers for any recursively enumerable language.
LeoNerd Ohsure...
LeoNerd In terms of CS guarantees it's very nice
yrlnry So if you are in a big hurry to get the wrong answer...
LeoNerd But I live in the practical pragmatic world
LeoNerd E.g. Parser::MGC is horribly slow at backtracking and whatnot, but I write parsers in it because those are still fast for "reasonably" sized inputs, parsers are fast to write, and I like having lots of side-effects and dynamic logic -in- Perl
Altreus Unfortunately my universe doesn't have infinite processing speeds and data storage
anno a universe with infinite processing speed would have processed you by now
Altreus and
Altreus would have processed my grandchildren too
yrlnry This algorithm doesn' t need infinite speed or storage.
yrlnry It works slowly, but finitely.
Altreus what
yrlnry The infinite list is lazily generated and you never have more than one of its elements in memory at any time.
rindolf yrlnry: is it sorted by length?
yrlnry You will learn this sort of technique after you have been programming in Perl for eight months or so.
Altreus how do you know when it doesn't match
Altreus yrlnry: :D
yrlnry rindolf: it is sorted by length, and lexicographically among strings of the same length.
rindolf yrlnry: ah.
yrlnry Of course, you cannot do the length-sorting thing for arbitrary languages, but for regex languages there is no trouble.
yrlnry http://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf
LeoNerd Eh..
LeoNerd I dunno. I just dislike purely RE-based parsing
LeoNerd I much prefer code doing it
GordonFreeman why can't perl regexp do variable length lookbehind matching?
Altreus See originally I ignored you because it sounded like you were talking shit
LeoNerd Limit of the implementation
Altreus mainly because it is possible to construct a regex with an infinite range that nevertheless won't match a particular string
anno GordonFreeman: who knows? looks like it's hard to implement with the given engine
mauke GordonFreeman: unclear semantics and no one's bothered to write the code
GordonFreeman i see
Altreus Plus, there's a fucking lot of unicode to create strings out of
LeoNerd It's not "hard" to implement. It's impossible given the algorithm being used
mauke LeoNerd: why impossible?
yrlnry LeoNerd: I don't think that's true. It could be done using a recursive call to the regex engine now that that is possible.
GordonFreeman but lookbehind is cool
LeoNerd Oooh.. yes.. I suppose it could do that now
GordonFreeman its like a reverse regexp that can be excluded
anno vim re's do it
LeoNerd vim uses a different type of engine
anno right
yrlnry Altreus: I was talking shit. After eight months you get a license to do that.
mauke really?
Altreus yrlnry: but there's a pdf
yrlnry where's a PDF?
Altreus 17:10 < yrlnry> http://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf
yrlnry Yes.
Altreus I didn't open it or anything
mauke no one opens pdfs
yrlnry PDFs are for cowards and Slavs.
Altreus but it lent enough credence to your words that I decided to believe your spurious claims
Altreus Actually someone did a test the other day
yrlnry Oh, does "talking shit" mean "making up nonsense"? Then I was not talking shit.
Altreus He linked someone to articles supporting his viewpoint and they changed their mind
yrlnry It is in section 6.5, "regex string generation".
Altreus but one of the articles was an argument against himself
Altreus Showing that it is enough to cite your sources to be believed; not many people will actually bother to check them
Altreus yrlnry: what do you normally think "talking shit" means?
Altreus are you confusing it with shooting the shit
yrlnry I'm not sure.
Altreus are you foreign
yrlnry Yes.
Altreus ok then
mauke hahaha
Channel #perl
Network Freenode
Tagline Negative Lookbehind Regexes for matching HTML