Home Page
Archive > Posts > Tags > HTML
Search:

Download all of an author’s fictionpress stories

I was surprised in my failure to find a script online to download all of an author’s stories from Fiction Press or Fan Fiction.Net, so I threw together the below.

If you go to an author’s page in a browser (only tested in Chrome) it should have all of their stories, and you can run the following script in the console (F12) to grab them all. Their save name format is STORY_NAME_LINK_FORMAT - CHAPTER_NUMBER.html. It works as follows:

  1. Gathers all of the names, chapter 1 links, and chapter counts for each story.
  2. Converts this information into a list of links it needs to download. The links are formed by using the chapter 1 link, and just replacing the chapter number.
  3. It then downloads all of the links to your current browser’s download folder.

Do note that chrome should prompt you to answer “This site is attempting to download multiple files”. So of course, say yes. The script is also designed to detect problems, which would happen if fictionpress changes their html formatting.

//Gather the story information
const Stories=[];
$('.mystories .stitle').each((Index, El) =>
	Stories[Index]={Link:$(El).attr('href'), Name:$(El).text()}
);
$('.mystories .xgray').each((Index, El) =>
	Stories[Index].NumChapters=/ - Chapters: (\d+) - /.exec($(El).text())[1]
);

//Get links to all stories
const LinkStart=document.location.protocol+'//'+document.location.host;
const AllLinks=[];
$.each(Stories, (_, Story) => {
	if(typeof(Story.NumChapters)!=='string' || !/^\d+$/.test(Story.NumChapters))
		return console.log('Bad number of chapters for: '+Story.Name);
	const StoryParts=/^\/s\/(\d+)\/1\/(.*)$/.exec(Story.Link);
	if(!StoryParts)
		return console.log('Bad link format for stories: '+Story.Name);
	for(let i=1; i<=Story.NumChapters; i++)
		AllLinks.push([LinkStart+'/s/'+StoryParts[1]+'/'+i+'/'+StoryParts[2], StoryParts[2]+' - '+i+'.html']);
});

//Download all the links
$.each(AllLinks, (_, LinkInfo) =>
	$('a').attr('download', LinkInfo[1]).attr('href', LinkInfo[0])[0].click()
);

jQuery('.blurb.group .heading a[href^="/works"]').map((_, El) => jQuery(El).text()).toArray().join('\n');
Getting HTML from Simple Machine Forum (SMF) Posts

When I first created my website 10 years ago, from scratch, I did not want to deal with writing a comment system with HTML markups. And in those days, there weren’t plugins for everything like there is today. My solution was setting up a forum which would contain a topic for every Project, Update, and Post, and have my pages mirror the linked topic’s posts.

I had just put in a quick hack at the time in which the pulled SMF message’s body had links converted from bbcode (there might have been 1 other bbcode I also hooked). I had done this with regular expressions, which was a nasty hack.

So anywho, I finally got around to writing a script that converts SMF messages’ bbcode to HTML and caches it. You can download it here, or see the code below. The script is optimized so that it only ever needs to load SMF code when a post has not yet been cached. Caching happens during the initial loading of an SMF post within the script’s main function, and is discarded if the post is changed.

The script requires that you run the query on line #3 of itself in your SMF database. Directly after that are 3 variables you need to set. The script assumes you are already logged in to the appropriate user. To use it, call “GFTP\GetForumTopicPosts($ForumTopicID)”. I have the functions split up so you can do individual posts too if needed (requires a little extra code).


<?
//This SQL command must be ran before using the script
//ALTER TABLE smf_messages ADD body_html text, ADD body_md5 char(32) DEFAULT NULL;

namespace GFTP;

//Forum database variables
global $ForumInfo;
$ForumInfo=Array(
    'DBName'=>'YourDatabase_smf',
    'Location'=>'/home/YourUser/www',
    'MessageTableName'=>'smf2_messages',
);

function GetForumTopicPosts($ForumTopicID)
{
    //Change to the forum database
    global $ForumInfo;
    $CurDB=mysql_fetch_row(mysql_query('SELECT database()'))[0];
    if($CurDB!=$ForumInfo['DBName'])
        mysql_select_db($ForumInfo['DBName']);
    $OldEncoding=SetEncoding(true);

    //Get the posts
    $PostsInfos=Array();
    $PostsQuery=mysql_query('SELECT '.implode(', ', PostFields())." FROM $ForumInfo[MessageTableName] WHERE id_topic='".intval($ForumTopicID).
        "' AND approved=1 ORDER BY id_msg ASC LIMIT 1, 9999999");
    if($PostsQuery) //If query failed, do not process
        while(($PostInfo=mysql_fetch_assoc($PostsQuery)) && ($PostsInfos[]=$PostInfo))
            if(md5($PostInfo['body'])!=$PostInfo['body_md5']) //If the body md5s do not match, get new value, otherwise, use cached value
                ProcessPost($PostsInfos[count($PostsInfos)-1]); //Process the lastest post as a reference

    //Restore from the forum database
    if($CurDB!=$ForumInfo['DBName'])
        mysql_select_db($CurDB);
    SetEncoding(false, $OldEncoding);

    //Return the posts
    return $PostsInfos;
}

function ProcessPost(&$PostInfo) //PostInfo must have fields id_msg, body, body_md5, and body_html
{
    //Load SMF
    global $ForumInfo;
    if(!defined('SMF'))
    {
        global $context;
        require_once(rtrim($ForumInfo['Location'], DIRECTORY_SEPARATOR).DIRECTORY_SEPARATOR.'SSI.php');
        mysql_select_db($ForumInfo['DBName']);
        SetEncoding();
    }

    //Update the cached body_html field
    $ParsedCode=$PostInfo['body_html']=parse_bbc($PostInfo['body']);
    $EscapedHTMLBody=mysql_escape_string($ParsedCode);
    $BodyMD5=md5($PostInfo['body']);
    mysql_query("UPDATE $ForumInfo[MessageTableName] SET body_html='$EscapedHTMLBody', body_md5='$BodyMD5' WHERE id_msg=$PostInfo[id_msg]");
}

//The fields to select in the Post query
function PostFields() { return Array('id_msg', 'poster_time', 'id_member', 'subject', 'poster_name', 'body', 'body_md5', 'body_html'); }

//Swap character encodings. Needs to be set to utf8
function SetEncoding($GetOld=false, $NewSet=Array('utf8', 'utf8', 'utf8'))
{
    //Get the old charset if required
    $CharacterVariables=Array('character_set_client', 'character_set_results', 'character_set_connection');
    $OldSet=Array();
    if($GetOld)
    {
        //Fill in variables with default in case they are not found
        foreach($CharacterVariables as $Index => $Variable)
            $OldSet[$Variable]='utf8';

        //Query for the character sets and update the OldSet array
        $Query=mysql_query('SHOW VARIABLES LIKE "character_%"');
        while($VariableInfo=mysql_fetch_assoc($Query))
            if(isset($OldSet[$VariableInfo['Variable_name']]))
                $OldSet[$VariableInfo['Variable_name']]=$VariableInfo['Value'];

        $OldSet=array_values($OldSet); //Turn back into numerical array
    }

    //Change to the new database encoding
    $CompiledSets=Array();
    foreach($CharacterVariables as $Index => $Variable)
        $CompiledSets[$Index]=$CharacterVariables[$Index].'="'.mysql_escape_string($NewSet[$Index]).'"';
    mysql_query('SET '.implode(', ', $CompiledSets));

    //If requested, return the previous values
    return $OldSet;
}
?>
Sending URLs as a file in an HTML form using AJAX
It is common knowledge that you can use the FormData class to send a file via AJAX as follows:
var DataToSend=new FormData();
DataToSend.append(PostVariableName, VariableData); //Send a normal variable
DataToSend.append(PostFileVariableName, FileElement.files[0], PostFileName); //Send a file
var xhr=new XMLHttpRequest();
xhr.open("POST", YOUR_URL, true);
xhr.send(DataToSend);

Something that is much less known, which doesn't have any really good full-process examples online (that I could find), is sending a URL's file as the posted file.
This is doable by downloading the file as a Blob, and then directly passing that blob to the FormData. The 3rd parameter to the FormData.append should be the file name.

The following code demonstrates downloading the file. I did not worry about adding error checking.
function DownloadFile(
    FileURL,     //http://...
    Callback,    //The function to call back when the file download is complete. It receives the file Blob.
    ContentType) //The output Content-Type for the file. Example=image/jpeg
{
    var Req=new XMLHttpRequest();
    Req.responseType='arraybuffer';
    Req.onload=function() {
        Callback(new Blob([this.response], {type:ContentType}));
    };
    Req.open("GET", FileURL, true);
    Req.send();
}

And the following code demonstrates submitting that file
//User Variables
var DownloadURL="https://www.castledragmire.com/layout/PopupBG.png";
var PostURL="https://www.castledragmire.com/ProjectContent/WebScripts/Default_PHP_Variables.php";
var PostFileVariableName="MyFile";
var OutputFileName="Example.jpg";
//End of User Variables

DownloadFile(DownloadURL, function(DownloadedFileBlob) {
    //Get the data to send
    var Data=new FormData();
    Data.append(PostFileVariableName, DownloadedFileBlob, OutputFileName);

    //Function to run on completion
    var CompleteFunction=function(ReturnData) {
        //Add your code in this function to handle the ajax result
        var ReturnText=(ReturnData.responseText ? ReturnData : this).responseText;
        console.log(ReturnText);
    }

    //Normal AJAX example
    var Req=new XMLHttpRequest();
    Req.onload=CompleteFunction; //You can also use "onreadystatechange", which is required for some older browsers
    Req.open("POST", PostURL, true);
    Req.send(Data);

    //jQuery example
    $.ajax({type:'POST', url:PostURL, data:Data, contentType:false, processData:false, cache:false, complete:CompleteFunction});
});

Unfortunately, due to cross site scripting (XSS) security settings, you can generally only use ajax to query URLs on the same domain. I use my Cross site scripting solutions and HTTP Forwarders for this. Stackoverflow also has a good thread about it.

Pulling HTML from Github markdown for external use
Although, converting to markdown is a time consuming pain

So I started getting on the Github bandwagon FINALLY. I figured that while I was going to the trouble of remaking readme files for the projects into github markdown files, I might as well duplicate the compiled HTML for my website.

The below code is a simple PHP script to pull in the converted HTML from Github’s API and then do some more modifications to facilitate directly inserting it into a website.


Usage:
  • The variables that can be updated are all at the top of the file.
  • The script will always output the finished result to the user’s browser, but can also optionally save it to an external file by setting the $SaveFileName variable.
  • Stylesheet:
    • The script automatically includes a specified stylesheet from the $StylesheetLocation variable.
    • The stylesheet I used is from https://gist.github.com/somebox/1082608. I’m not too happy with its coloring scheme, but it’ll do for now.
    • The required modifications that need to be made to the css are to change “body” to “.GHMarkdown”, and then add “.GHMarkdown” before all other rules.
    • This is the one I am currently using for my website, but it also has a few modifications made specifically for my layouts.
  • Modifications
    • In my markdowns, I like to link to internal sections by first creating a bookmark as “<div name="BOOKMARK_NAME">...</div>” and then linking via “[LinkName](#BOOKMARK_NAME)”. While this works on github, the bookmark’s names are actually changed to something like “user-content-BOOKMARK-NAME”, which is not useable outside of github. The first $RegexModifications item therefore updates the bookmarks back to their original name, and turns them into <span>s (which github does not support).
    • The second rule just removes the “aria-hidden” attributes, which my W3C checking scripts throw a warning on.
  • Note that sometimes, the script may return an error of “transfer closed with XXX bytes remaining to read”. This means that github denied the request (probably due to too many requests in too short a timespan), but the input is too large so github prematurely terminated the connection. If this happens, try sending a tiny input and see if you get back a proper error.

<?php
//Variables
$SaveFileName='Output.html'; //Optionally save output to a file. Comment out to not save
$InputFile='Input.md';
$StylesheetLocation='github-markdown.css';
$RegexModifications=Array(
        '/<div name="user-content-(.*?)"(.*?)<\/div>/s'=>'<span id="$1"$2</span>', //Change <div name="user-contentXXX ---TO--- <span name="XXX
        '/ ?aria-hidden="true"/'=>'' //Remove aria-hidden attribute
);

//Set the curl options
$CurlHandle=curl_init(); //Init curl
curl_setopt_array($CurlHandle, Array(
        CURLOPT_URL=>           'https://api.github.com/markdown/raw', //Markdown/raw takes and returns plain text input and output
        CURLOPT_FAILONERROR=>   false,
        CURLOPT_FOLLOWLOCATION=>1,
        CURLOPT_RETURNTRANSFER=>1, //Return result as a string
        CURLOPT_TIMEOUT=>       300,
        CURLOPT_POST=>          1,
        CURLOPT_POSTFIELDS=>    file_get_contents($InputFile), //Pull in the requested file
        CURLOPT_HTTPHEADER=>    Array('Content-type: text/plain'), //Github expects the given data to be plaintext
        CURLOPT_SSL_VERIFYPEER=>0, //In case there are problems with the PHP ssl chain (often the case in Windows), ignore the error
        CURLOPT_USERAGENT=>     'Curl/PHP' //Github now requires a useragent to process the request
));

//Pull in the html converted markdown from Github
$Return=curl_exec($CurlHandle);
if(curl_errno($CurlHandle)) //Check for error
        $Return=curl_error($CurlHandle);
curl_close($CurlHandle);

//Make regex modifications
$Return=preg_replace(array_keys($RegexModifications), array_values($RegexModifications), $Return);

//Generate the final HTML. It will also be output here if not saving to a file
header('Content-Type: text/html; charset=utf-8');
if(isset($SaveFileName)) //If saving to a file, buffer output
        ob_start();
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Markdown pull</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="<?=$StylesheetLocation?>" rel=stylesheet type="text/css">
</head><body><div class=GHMarkdown>
<?=$Return?>
</div></body></html>
<?php
//Save to a file if requested
if(isset($SaveFileName))
        file_put_contents($SaveFileName, ob_get_flush()); //Actual output happens here too when saving to a file
?>
Format Text [to HTML] Script

After writing the documentation in plaintext format for DSQL just now, I needed to convert it into HTML for the project’s page. I’ve done this before manually and it’s always very daunting, so I decided to really quickly write a script to do most of the work for me, which can be downloaded here, or the code seen below.

It has the following:
  • Input text box with HTML data that is instantly shown as HTML in a below section when modified.
    Both sections take up half the vertical screen space
  • Undo/redo buffer for the text box (very primitive functionality)
  • “Open in new page” button, which opens a new window with just the HTML data (useful for validation [W3C or whatnot]).
    This is disabled by default because it is a dangerous option (XSS exploitable, so the script would need to be secured/password protected if this was on)
  • “Escape HTML” escapes HTML characters so they are not improperly interpreted (e.x. “<” becomes “&lt;”)
  • Listize:
    • Turns tabbed lists into HTML
    • For example:
      1
      	2
      	3
      		4
      5
      would become:
      1
      • 2
      • 3
        • 4
      5


I realized while making the script that I should probably instead just start making my documentation in a markup (like GitHub’s) and then have that converted to HTML and text files. Oh well.



Code:
<? header('Content-Type: text/html; charset=utf-8'); ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Format Text</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<?
$AllowRenderText=true; //Set to true only if this is in a secure environment, as directly outputting a given value can lead to XSS
if(isset($_REQUEST['RenderText']))
	return print '</head><body>'.($AllowRenderText ? $_REQUEST['RenderText'] : 'Rendering of text not allowed').'</body></html>';
?>
<style type="text/css">
html, body { width:100%; height:100%; margin:0; padding:0; }
.HalfScreen { display:block; width:calc(100% - 2px); height:calc(50% - 2px - 30px/2); margin:0; border:1px solid black; }
#RenderForm { overflow:hidden; }
#RenderText { margin:0; border:0; width:100%; height:100%; }
#RenderHTML { overflow-x:hidden; overflow-y:scroll; }
.TopBar { height:30px; background-color:grey; }
.Hide { position:absolute; visibility:hidden; top:-10000px; }
</style>

<script type="text/javascript" src="https://code.jquery.com/jquery-2.1.1.min.js"></script>
<script type="text/javascript">$(document).ready(function() {

//History for undoing
var UndoBuf=[], RedoBuf=[];
function Undo()
{
	if(!UndoBuf.length)
		return;
	RedoBuf.push(UndoBuf.pop());
	$('#RenderText').val(UndoBuf[UndoBuf.length-1]);
	$('#RenderHTML').html(UndoBuf[UndoBuf.length-1]);
}
function Redo()
{
	if(!RedoBuf.length)
		return;
	$('#RenderText').val(RedoBuf[RedoBuf.length-1]);
	$('#RenderHTML').html(RedoBuf[RedoBuf.length-1]);
	UndoBuf.push(RedoBuf.pop());
}
$('#Undo').click(function(e) { e.preventDefault(); Undo(); });
$('#Redo').click(function(e) { e.preventDefault(); Redo(); });

//Render HTML
function Render() {
	//Do the render
	var MyVal=$('#RenderText').val();
	$('#RenderHTML').html(MyVal);

	//Save current value to the history
	//*Better history functionality here would be real nice (using smart currentTarget.selectionStart/End calculations), along with an undo/redo button, but not within the scope of this project
	if(RedoBuf.length) //Empty redo buffer
		RedoBuf=[];
	UndoBuf.push(MyVal);
	if(UndoBuf.length>100) //Limit history buffer
		UndoBuf.shift();
}
$('#RenderText').on('keypress paste', function(e) { setTimeout(Render, 1); }); //Automatic update on paste requires a timeout

//Open in new page
$('#OpenInNewPage').click(function(e) {
	e.preventDefault();
	$('#RenderForm').submit();
});

//Escape HTML
$('#EscapeHTML').click(function(e) {
	e.preventDefault();
	$('#RenderText').val(function(index, value) {
		$.each({"&amp;":/&/g, "&lt;":/</g, "&gt;":/>/g, "&quot;":/"/g, "&#039;":/'/g}, function(HTMLStr, ReplStr) {
			value=value.replace(ReplStr, HTMLStr); });
		return value;
	});
	Render();
});

//Listize based on tabbing
//If a successive line is tabbed over beyond the current, it is made inside a new nested list.
//Tabbing over more than once on a successive line will create multiple nests
//Having @@@ at the beginning of a line will include it in the previous line item, no matter the tabbing
//Make sure to have @@@ blank lines tabbed over to the proper nested level
$('#Listize').click(function(e) {
	//Get the text to replace
	e.preventDefault();
	var T=$('#RenderText').val();

	//Go over each line and if the next line is tabbed beyond it, make it a new nested list. Blank
	var CurTabLevel=0, NewLines=[]; //NewLines is 2 items per line: the original string and the new html tags
	$.each(T.split(/\r?\n/), function(Index, Str) {
		//Check for a continued line item
		if(Str.substr(0, 3)=='@@@')
			return NewLines.push('<br>', Str.substr(3));

		//In/de-dent as needed
		var Tags='';
		var NewTabLevel=/^\t*/.exec(Str)[0].length, PreLevel=CurTabLevel; //Get the nested level
		for(;NewTabLevel>CurTabLevel;CurTabLevel++)
			Tags+='<ul><li>';
		for(;NewTabLevel<CurTabLevel;CurTabLevel--)
			Tags+='</li></ul>';

		//Fill out the rest of the line
		if(NewTabLevel==0) //Breaks between top level new lines
			Tags+=(Index && PreLevel==0 ? '<br>' : '');
		else if(PreLevel>=NewTabLevel) //If previous item needs to be ended (new level is not greater and not 0)
			Tags+='</li><li>';

		NewLines.push(Tags, Str);
	});

	//Finish de-dent as needed
	var Final=[NewLines.shift()];
	var EndLine='';
	while(CurTabLevel--)
		EndLine+='</li></ul>';
	NewLines.push(EndLine);

	//Combine each line with the tags
	for(var i=0;i<NewLines.length;i+=2)
		Final.push(NewLines[i+0]+NewLines[i+1]);



	//Update from the replaced text
	$('#RenderText').val(Final.join("\n"));
	Render();
});

});</script>

</head>
<body>
	<div class=TopBar>
		<input type=button id=EscapeHTML value="Escape HTML">
		<input type=button id=Listize value="Listize">
		<? if($AllowRenderText) { ?> <input type=button id=OpenInNewPage value="Open In New Page"> <? } ?>
		<input type=button id=Undo value="Undo">
		<input type=button id=Redo value="Redo">
	</div>
	<form action="FormatText.php" method=post id=RenderForm target="_blank" class=HalfScreen>
		<textarea id=RenderText name=RenderText></textarea>
		<input type=submit class=Hide>
	</form>
	<div id=RenderHTML class=HalfScreen></div>
</body>
</html>
Encoding & decoding HTML in JavaScript with jQuery

Here are a few functions I’ve been finding a lot of use for lately. They are basically the JavaScript equivalent for PHP’s htmlentities and html_entity_decode. These functions are useful for inserting HTML dynamically, and getting values of contentEditable fields. These functions do replace line breaks appropriately, and HTML2Text removes a trailing line break.


var TextTransformer=$('<div></div>');
function Text2HTML(T) { return TextTransformer.text(T).html().replace(/\r?\n/g, '<br>'); }
function HTML2Text(T) { return TextTransformer.html(ReplaceBreaks(T, "\x01br\x01")).text().replace(/\x01br\x01/g, "\n").replace(/\n$/, ''); }
function ReplaceBreaks(TheHTML, ReplaceText) { return TheHTML.replace(/<\s*br\s*\/?\s*>/g, ReplaceText || ' - '); }